IMPORTANT: Developer documentation for the current main branch.
This content is unreleased and may change before the next Polaris release.
For stable user documentation, see the
latest release docs.
Configuring S3 Storage
This page covers configuring AWS S3, and S3-compatible object stores (MinIO, Apache Ozone S3 gateway, Ceph RGW, and similar), as the storage backend for a Polaris catalog. On AWS S3, all read and write operations are performed using credential vending: Polaris assumes a customer IAM role via STS and returns scoped, short-lived credentials to the client. The IAM role, its trust policy, and the bucket itself must be set up before the catalog is created.
This page is limited to native Polaris authentication. External identity providers are also supported but are not yet covered here; the configuration patterns below remain otherwise the same.
IAM identities involved🔗
Three distinct IAM identities take part in the S3 credential-vending flow. Conflating them is the
most common source of AccessDenied errors at catalog creation or table load.
| # | Identity | Used by | Purpose |
|---|---|---|---|
| 1 | Polaris service identity | Polaris server process | Sign every sts:AssumeRole request Polaris makes |
| 2 | Catalog access role | Polaris (assumed) | Hold the actual S3 / KMS permissions on the catalog bucket |
| 3 | Vended credentials | Iceberg client (Spark/Trino/PyIceberg) | Short-lived session keys returned by Polaris at table-load time |
At runtime the client calls Polaris to load a table, Polaris signs an sts:AssumeRole request
as identity 1 against identity 2, AWS STS returns scoped temporary credentials (identity 3), and
the client uses those credentials to talk to S3 and KMS directly.
Identity 1 is configured once at Polaris deployment time. Identity 2 is created per catalog and its ARN is registered when the catalog is created. Identity 3 is generated on every table load and is never persisted.
Polaris service identity🔗
The Polaris server itself needs an AWS identity to call STS on the catalog access role. This is configured outside Polaris — through the standard AWS SDK credentials chain — and is independent of any catalog.
Pick whichever discovery mechanism matches the deployment:
- EKS / IRSA — annotate the Polaris ServiceAccount with the IAM role ARN
(
eks.amazonaws.com/role-arn). The pod receives a projected token and exchanges it for STS credentials automatically. - EC2 — attach an IAM instance profile to the EC2 instance. The SDK reads credentials from IMDS.
- Static credentials — set
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY, optionallyAWS_SESSION_TOKENandAWS_REGIONin the Polaris container’s environment. Suitable for local development only.
Any other AWS compute environment that participates in the standard AWS SDK credentials chain should also work, though the patterns above are the ones we have validated.
Whichever mechanism is used, the resulting identity needs a single permission to talk to STS:
1{
2 "Version": "2012-10-17",
3 "Statement": [
4 {
5 "Effect": "Allow",
6 "Action": "sts:AssumeRole",
7 "Resource": "arn:aws:iam::123456789012:role/polaris-warehouse-access"
8 }
9 ]
10}
Resource should list every catalog access role Polaris is expected to assume; use a wildcard
(for example arn:aws:iam::123456789012:role/polaris-*) only when role names follow a strict
naming convention that the AWS account owner controls.
The ARN of this identity is the Principal that the catalog access role’s trust policy must
trust — see the next section. One easy mistake is to update the catalog role’s permissions
without also adding the Polaris service identity to its trust policy; the symptom is
STS:AssumeRole returning AccessDenied even though the catalog role has correct S3
permissions.
For an end-to-end deployment example that exercises this identity on EC2, see Deploying Polaris on AWS.
Catalog access role and trust policy🔗
Polaris assumes a customer-managed IAM role via STS when a client requests credentials. The role must:
- Grant the actions required for object access on the bucket and prefix that backs the catalog
(
s3:GetObject,s3:PutObject,s3:DeleteObject,s3:ListBucketand, if encryption is in use, the relevantkms:*actions). - Trust the Polaris service principal — typically the IAM role that the Polaris server runs as.
Polaris fills the
sts:AssumeRolerequest with anexternalIdwhen one is configured. The trust policy must accept the same external ID.
Using externalId is recommended for cross-account or hosted Polaris deployments to mitigate the
confused-deputy problem. A minimal trust policy looks like:
1{
2 "Version": "2012-10-17",
3 "Statement": [
4 {
5 "Effect": "Allow",
6 "Principal": { "AWS": "arn:aws:iam::123456789012:role/polaris-server" },
7 "Action": "sts:AssumeRole",
8 "Condition": {
9 "StringEquals": { "sts:ExternalId": "polaris-prod" }
10 }
11 }
12 ]
13}
Catalog storage configuration🔗
Provide the role ARN, region, and externalId when creating the catalog. The token in the
Authorization header below is the Polaris admin bearer token obtained from
/api/catalog/v1/oauth/tokens (see Configuring Polaris for Production for
how to bootstrap and issue admin tokens).
1curl -X POST https://<polaris-host>/management/v1/catalogs \
2 -H "Authorization: Bearer $TOKEN" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "catalog": {
6 "type": "INTERNAL",
7 "name": "warehouse_s3",
8 "properties": { "default-base-location": "s3://warehouse-bucket/prod/" },
9 "storageConfigInfo": {
10 "storageType": "S3",
11 "roleArn": "arn:aws:iam::123456789012:role/polaris-warehouse-access",
12 "externalId": "polaris-prod",
13 "region": "us-east-1"
14 }
15 }
16 }'
The role ARN is validated against the pattern enforced by AwsStorageConfigurationInfo; an
ill-formed ARN is rejected at catalog creation time.
Server-side encryption with KMS🔗
When the bucket uses SSE-KMS, supply both currentKmsKey (the key Polaris should use for writes)
and allowedKmsKeys (every key the catalog is allowed to read from). The two fields are processed
independently in AwsCredentialsStorageIntegration, so the write key must be included in
allowedKmsKeys as well if you want it readable through vended credentials:
1"storageConfigInfo": {
2 "storageType": "S3",
3 "roleArn": "...",
4 "region": "us-east-1",
5 "currentKmsKey": "arn:aws:kms:us-east-1:123456789012:key/aaaa-bbbb",
6 "allowedKmsKeys": [
7 "arn:aws:kms:us-east-1:123456789012:key/aaaa-bbbb",
8 "arn:aws:kms:us-east-1:123456789012:key/cccc-dddd"
9 ]
10}
The IAM role’s policy must include kms:GenerateDataKey and kms:Decrypt on currentKmsKey and
kms:Decrypt on every key listed in allowedKmsKeys, and each key policy must grant the same to
the role principal.
If the deployment does not use KMS, set kmsUnavailable to true so Polaris will not request
KMS-related session permissions:
1"kmsUnavailable": true
S3-compatible endpoints🔗
Polaris can be pointed at S3-compatible object stores (MinIO, Ceph RGW, Apache Ozone S3 gateway). The available fields are:
endpoint— the S3 API endpoint Polaris and its clients should call.endpointInternal— optional, used by the Polaris server when the in-cluster endpoint differs from the one returned to clients.pathStyleAccess— set totruefor backends that do not support virtual-host-style addressing.stsEndpoint— STS endpoint; defaults toendpointInternalthenendpointwhen not set.stsUnavailable— set totruewhen the backend does not implement STS.
The credential-vending guarantee at the top of this page assumes that the backend implements STS.
For AWS S3 and S3-compatible backends that expose the STS API (such as MinIO), leave
stsUnavailable unset (or false) and the vended-credentials flow described above works as is.
1"storageConfigInfo": {
2 "storageType": "S3",
3 "endpoint": "https://s3.internal.example.com",
4 "pathStyleAccess": true,
5 "region": "us-east-1"
6}
For S3-compatible backends without STS (Apache Ozone S3 gateway, or Ceph RGW without STS enabled),
set stsUnavailable: true. Polaris will then skip subscoped credential vending entirely, and the
client must omit X-Iceberg-Access-Delegation: vended-credentials and authenticate to the object
store directly. The Polaris guides for Apache Ozone and Ceph show
this pattern end-to-end.
1"storageConfigInfo": {
2 "storageType": "S3",
3 "endpoint": "https://s3.internal.example.com",
4 "pathStyleAccess": true,
5 "stsUnavailable": true,
6 "region": "us-east-1"
7}
Client configuration🔗
Engines connect through the Iceberg REST API and let Polaris vend credentials at table-load time; they do not need static AWS credentials when STS is available.
Spark example, matching the property names used by the existing MinIO / RustFS guides:
1bin/spark-sql \
2 --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.10.1,org.apache.iceberg:iceberg-aws-bundle:1.10.1 \
3 --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
4 --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \
5 --conf spark.sql.catalog.polaris.type=rest \
6 --conf spark.sql.catalog.polaris.uri=https://<polaris-host>/api/catalog \
7 --conf spark.sql.catalog.polaris.oauth2-server-uri=https://<polaris-host>/api/catalog/v1/oauth/tokens \
8 --conf spark.sql.catalog.polaris.token-refresh-enabled=false \
9 --conf spark.sql.catalog.polaris.warehouse=warehouse_s3 \
10 --conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \
11 --conf spark.sql.catalog.polaris.credential=<client-id>:<client-secret> \
12 --conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials
The oauth2-server-uri is recommended: without it the Iceberg REST client falls back to a
hard-coded /v1/oauth/tokens path and logs a deprecation warning, since the automatic fallback
is slated for removal in a future Iceberg release.
For Trino, use the Iceberg connector with the REST catalog. The REST/OAuth2 properties talk to
Polaris, and Polaris vends the endpoint, path-style flag, and region together with the scoped
credentials (s3.endpoint, s3.path-style-access, client.region in the load-table response),
so they do not need to be repeated on the client. The native S3 filesystem still has to be
enabled on the Trino side:
1connector.name=iceberg
2iceberg.catalog.type=rest
3iceberg.rest-catalog.uri=https://<polaris-host>/api/catalog
4iceberg.rest-catalog.warehouse=warehouse_s3
5iceberg.rest-catalog.security=OAUTH2
6iceberg.rest-catalog.oauth2.credential=<client-id>:<client-secret>
7iceberg.rest-catalog.oauth2.scope=PRINCIPAL_ROLE:ALL
8iceberg.rest-catalog.oauth2.server-uri=https://<polaris-host>/api/catalog/v1/oauth/tokens
9iceberg.rest-catalog.vended-credentials-enabled=true
10fs.native-s3.enabled=true
For PyIceberg, use the rest catalog type. The same Polaris-side properties (uri, warehouse,
credential, scope, oauth2-server-uri) apply, and the vended-credential header must be
forwarded as a REST header:
1from pyiceberg.catalog.rest import RestCatalog
2
3cat = RestCatalog(
4 name="polaris",
5 **{
6 "uri": "https://<polaris-host>/api/catalog",
7 "warehouse": "warehouse_s3",
8 "credential": "<client-id>:<client-secret>",
9 "scope": "PRINCIPAL_ROLE:ALL",
10 "oauth2-server-uri": "https://<polaris-host>/api/catalog/v1/oauth/tokens",
11 "header.X-Iceberg-Access-Delegation": "vended-credentials",
12 },
13)
Polaris returns the vended S3 properties (s3.access-key-id, s3.secret-access-key,
s3.session-token) to the client at table-load time, so static credentials should not be
configured on the PyIceberg side.
Verifying the setup🔗
A successful end-to-end test should be possible without giving the client any long-lived AWS credentials:
1CREATE NAMESPACE warehouse_s3.demo;
2CREATE TABLE warehouse_s3.demo.t (id BIGINT, name STRING) USING iceberg;
3INSERT INTO warehouse_s3.demo.t VALUES (1, 'hello');
4SELECT * FROM warehouse_s3.demo.t;
If INSERT or SELECT fails with a 403, the most common causes are:
- The IAM role’s trust policy does not match the
roleArn/externalIdPolaris is presenting. - The role grants S3 permissions but is missing the KMS actions for
currentKmsKey. - The bucket policy denies access from outside a specific VPC endpoint.
Polaris logs the assumed-role STS request at debug level, which is the fastest way to confirm which identity is being presented.