IMPORTANT: Developer documentation for the current main branch.
This content is unreleased and may change before the next Polaris release.
For stable user documentation, see the
latest release docs.
Hive Metastore Federation
Polaris can federate catalog operations to an existing Hive Metastore (HMS). This lets an external HMS remain the source of truth for table metadata while Polaris brokers access, policies, and multi-engine connectivity.
Build-time enablement🔗
The Hive factory is packaged as an optional extension and is not baked into default server builds.
Include it when assembling the runtime or container images by setting the NonRESTCatalogs Gradle
property to include HIVE (and any other non-REST backends you need):
1./gradlew :polaris-server:assemble :polaris-server:quarkusAppPartsBuild --rerun \
2 -PNonRESTCatalogs=HIVE -Dquarkus.container-image.build=true
runtime/server/build.gradle.kts wires the extension in only when this flag is present, so binaries
built without it will reject Hive federation requests.
Feature configuration🔗
After building Polaris with Hive support, enable the necessary feature flags in your
application.properties file (or equivalent configuration mechanism such as environment variables or
a Kubernetes ConfigMap):
1# Allows both REST and HIVE connection type
2polaris.features."SUPPORTED_CATALOG_CONNECTION_TYPES"=["ICEBERG_REST","HIVE"]
3
4# Allows IMPLICIT authentication, needed for Hive federation
5polaris.features."SUPPORTED_EXTERNAL_CATALOG_AUTHENTICATION_TYPES"=["OAUTH","IMPLICIT"]
6
7# Enables the federation feature itself
8polaris.features."ENABLE_CATALOG_FEDERATION"=true
For Kubernetes deployments, add these properties to the ConfigMap mounted into the Polaris container
(typically at /deployment/config/application.properties).
Runtime requirements🔗
- Metastore connectivity: Expose the HMS Thrift endpoint (
thrift://host:port) to the Polaris deployment. - Configuration discovery: Iceberg’s
HiveCatalogloads Hadoop/Hive client settings from the classpath. Providehive-site.xml(andcore-site.xmlif needed) viaHADOOP_CONF_DIR/HIVE_CONF_DIRor an image layer. - Authentication: Hive federation only supports
IMPLICITauthentication, meaning Polaris uses the operating-system or Kerberos identity of the running process (no stored secrets). Ensure the service principal is logged in or holds a valid keytab/TGT before starting Polaris. - Object storage role: Configure
polaris.service-identity.<realm>.aws-iam.*(or the default realm) so the server can assume the AWS role referenced by the catalog. The IAM role must allow STS access from the Polaris service identity and grant permissions to the table locations.
Kerberos setup example🔗
If your Hive Metastore enforces Kerberos, stage the necessary configuration alongside Polaris:
1export KRB5_CONFIG=/etc/polaris/krb5.conf
2export HADOOP_CONF_DIR=/etc/polaris/hadoop-conf # contains hive-site.xml with HMS principal
3export HADOOP_OPTS="-Djava.security.auth.login.config=/etc/polaris/jaas.conf"
4kinit -kt /etc/polaris/keytabs/polaris.keytab polaris/service@EXAMPLE.COM
hive-site.xmlmust definehive.metastore.sasl.enabled=true, the metastore principal, and client principal pattern (for examplehive.metastore.client.kerberos.principal=polaris/_HOST@REALM).- The JAAS entry (referenced by
java.security.auth.login.config) should useuseKeyTab=trueand point to the same keytab shown above so the Polaris JVM can refresh credentials automatically. - Keep the keytab readable solely by the Polaris service user; the implicit authenticator consumes the TGT at startup and for periodic renewal.
Warehouse access patterns🔗
When Polaris federates to HMS, the server-side HiveCatalog must be able to reach the warehouse
storage itself in order to load table metadata. This happens before Polaris can return any
client-side storage credentials. In practice, Hive federation therefore works cleanly only when the
Polaris server already has a working access path to the warehouse.
The current runtime behavior after building Polaris with -PNonRESTCatalogs=HIVE is best
understood as follows:
| Pattern | Works after a -PNonRESTCatalogs=HIVE build | Notes |
|---|---|---|
| HDFS or warehouse access fully handled by ambient Hadoop config and process identity | Yes | Recommended fit for the current design. Provide hive-site.xml / core-site.xml plus any required Kerberos or Hadoop client configuration. |
HadoopFileIO with s3a:// warehouses | Only with extra runtime packaging | A HIVE-enabled Polaris build still needs hadoop-aws and related filesystem dependencies. They are not included just by enabling the HIVE build flag. |
S3FileIO with ambient AWS credentials | Sometimes | Can work when Polaris is explicitly configured to use S3FileIO and the process already has valid AWS credentials and region settings. This is not the general documented production path today, and non-secret endpoint or path-style settings may still be needed for S3-compatible stores. |
| UNSAFE! Putting object-store credentials in catalog properties | DO NOT USE / THIS IS UNSAFE | Unsafe. Secrets placed in catalog properties are returned to authenticated catalog clients through /config. This may look like a workaround in some S3-compatible deployments, but it exposes passwords, access keys, session tokens, or other secrets as plain text. |
hive-site.xml and core-site.xml can be mounted via HADOOP_CONF_DIR / HIVE_CONF_DIR; they do
not by themselves require rebuilding Polaris. By contrast, adding missing filesystem client
libraries such as hadoop-aws does require custom packaging of the Polaris runtime or container
image.
Creating a federated catalog🔗
Use the Management API (or the Python CLI) to create an external catalog whose connection type is
HIVE. The following request registers a catalog that proxies to an HMS running on
thrift://hms.example.internal:9083:
1curl -X POST https://<polaris-host>/management/v1/catalogs \
2 -H "Authorization: Bearer $TOKEN" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "type": "EXTERNAL",
6 "name": "analytics_hms",
7 "storageConfigInfo": {
8 "storageType": "S3",
9 "roleArn": "arn:aws:iam::123456789012:role/polaris-warehouse-access",
10 "region": "us-east-1"
11 },
12 "properties": { "default-base-location": "s3://analytics-bucket/warehouse/" },
13 "connectionConfigInfo": {
14 "connectionType": "HIVE",
15 "uri": "thrift://hms.example.internal:9083",
16 "warehouse": "s3://analytics-bucket/warehouse/",
17 "authenticationParameters": { "authenticationType": "IMPLICIT" }
18 }
19 }'
Grant catalog roles to principal roles exactly as you would for internal catalogs so engines can obtain tokens that authorize against the federated metadata.
default-base-location is required; it tells Polaris and Iceberg where to place new metadata files.
allowedLocations is optional—supply it only when you want to restrict writers to a specific set of
prefixes. If your IAM trust policy requires an externalId or explicit userArn, include those
optional fields in storageConfigInfo. Polaris persists them and supplies them when assuming the
role cited by roleArn during metadata commits.
Limitations and operational notes🔗
- Single identity: Because only
IMPLICITauthentication is permitted, Polaris cannot mix multiple Hive identities in a single deployment (HiveFederatedCatalogFactoryrejects other auth types). Plan a deployment topology that aligns the Polaris process identity with the target HMS. - Generic tables: The Hive extension exposes Iceberg tables registered in HMS. Generic table
federation is not implemented (
HiveFederatedCatalogFactory#createGenericCatalogthrowsUnsupportedOperationException). - Configuration caching: Atlas-style catalog failover and multi-HMS routing are not yet handled;
Polaris initializes one
HiveCatalogper connection and relies on the underlying Iceberg client for retries.
With these constraints satisfied, Polaris can sit in front of an HMS so that Iceberg tables managed there gain OAuth-protected, multi-engine access through the Polaris REST APIs.