Getting Started with Apache Polaris and RustFS

ℹ️ Assets for this guide can be accessed from the Apache Polaris Git repository

⚠️ Warning

Disclaimer: This guide uses mc from MinIO OSS for local testing only. MinIO OSS is in maintenance mode, and MinIO container images may no longer receive updates or security fixes. For production setups, rc should be used.

Overview🔗

This example uses RustFS as a storage provider with Polaris.

Spark is used as a query engine. This example assumes a local Spark installation. See the Spark Notebooks Example for a more advanced Spark setup.

Starting the Example🔗

  1. Build the Polaris server image if it’s not already present locally:

    ./gradlew \
       :polaris-server:assemble \
       :polaris-server:quarkusAppPartsBuild --rerun \
       -Dquarkus.container-image.build=true
    
  2. Start the docker compose group by running the following command from the root of the repository:

    docker compose -f site/content/guides/rustfs/docker-compose.yml up
    

Connecting From Spark🔗

bin/spark-sql \
    --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.10.1,org.apache.iceberg:iceberg-aws-bundle:1.10.1 \
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
    --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.polaris.type=rest \
    --conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \
    --conf spark.sql.catalog.polaris.token-refresh-enabled=false \
    --conf spark.sql.catalog.polaris.warehouse=quickstart_catalog \
    --conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \
    --conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials \
    --conf spark.sql.catalog.polaris.credential=root:s3cr3t \
    --conf spark.sql.catalog.polaris.client.region=us-west-2 \
    --conf spark.sql.catalog.polaris.s3.endpoint=http://rustfs:9000

Note: s3cr3t is defined as the password for the root user in the docker-compose.yml file.

Note: The client.region configuration is required for the AWS S3 client to work, but it is not used in this example since RustFS does not require a specific region.

Running Queries🔗

Run inside the Spark SQL shell:

USE polaris;

CREATE NAMESPACE ns;

CREATE TABLE ns.t1 AS SELECT 'abc';

SELECT * FROM ns.t1;
-- abc

RustFS Endpoints🔗

Note that the catalog configuration defined in the docker-compose.yml contains different endpoints for the Polaris Server and the client (Spark). Specifically, the client endpoint is http://localhost:9000, but endpointInternal is http://rustfs:9000.

This is necessary because clients running on localhost do not normally see service names (such as rustfs) that are internal to the docker compose environment.