Getting Started with Apache Polaris and Apache Ozone
ℹ️ Assets for this guide can be accessed from the Apache Polaris Git repository
Overview🔗
This example uses Apache Ozone as a storage provider with Polaris.
Spark is used as a query engine. This example assumes a local Spark installation. See the Spark Notebooks Example for a more advanced Spark setup.
Starting the Example🔗
Start the docker compose group by running the following command from the root of the repository:
docker compose -f site/content/guides/ozone/docker-compose.yml up
Note: this example pulls the apache/polaris:latest image, but assumes the image is 1.2.0-incubating or later.
Connecting From Spark🔗
bin/spark-sql \
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.10.1,org.apache.iceberg:iceberg-aws-bundle:1.10.1 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.polaris.type=rest \
--conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \
--conf spark.sql.catalog.polaris.token-refresh-enabled=false \
--conf spark.sql.catalog.polaris.warehouse=quickstart_catalog \
--conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \
--conf spark.sql.catalog.polaris.credential=root:s3cr3t \
--conf spark.sql.catalog.polaris.client.region=us-west-2 \
--conf spark.sql.catalog.polaris.s3.access-key-id=polaris_root \
--conf spark.sql.catalog.polaris.s3.secret-access-key=polaris_pass
Note: s3cr3t is defined as the password for the root user in the docker-compose.yml file.
Note: The client.region, s3.access-key-id, and s3.secret-access-key configuration are required
for the AWS S3 client to work, but they are not used in this example since Ozone does not require them
when securing S3 is not enabled.
Running Queries🔗
Run inside the Spark SQL shell:
USE polaris;
CREATE NAMESPACE ns;
CREATE TABLE ns.t1 AS SELECT 'abc';
SELECT * FROM ns.t1;
-- abc
Lack of Credential Vending🔗
Notice that the Spark configuration does not contain a X-Iceberg-Access-Delegation header.
This is because Ozone does not support the STS API and consequently cannot produce session
credentials to be vended to Polaris clients.
The lack of STS API is represented in the Catalog storage configuration by the
stsUnavailable=true property.
S3 Credentials🔗
In this example Ozone does not have securing S3 enabled for accessing its S3 API. Therefore, use any AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY will work for accessing S3 API.
S3 Endpoints🔗
Note that the catalog configuration defined in the docker-compose.yml contains
different endpoints for the Polaris Server and the client (Spark). Specifically,
the client endpoint is http://localhost:9878, but endpointInternal is http://ozone-s3g:9878.
This is necessary because clients running on localhost do not normally see service
names (such as ozone-s3g) that are internal to the docker compose environment.