IMPORTANT: Developer documentation for the current main branch. This content is unreleased and may change before the next Polaris release. For stable user documentation, see the latest release docs.

Hive Metastore Federation

Polaris can federate catalog operations to an existing Hive Metastore (HMS). This lets an external HMS remain the source of truth for table metadata while Polaris brokers access, policies, and multi-engine connectivity.

Build-time enablement🔗

The Hive factory is packaged as an optional extension and is not baked into default server builds. Include it when assembling the runtime or container images by setting the NonRESTCatalogs Gradle property to include HIVE (and any other non-REST backends you need):

1./gradlew :polaris-server:assemble :polaris-server:quarkusAppPartsBuild --rerun \
2  -PNonRESTCatalogs=HIVE -Dquarkus.container-image.build=true

runtime/server/build.gradle.kts wires the extension in only when this flag is present, so binaries built without it will reject Hive federation requests.

Feature configuration🔗

After building Polaris with Hive support, enable the necessary feature flags in your application.properties file (or equivalent configuration mechanism such as environment variables or a Kubernetes ConfigMap):

1# Allows both REST and HIVE connection type
2polaris.features."SUPPORTED_CATALOG_CONNECTION_TYPES"=["ICEBERG_REST","HIVE"]
3
4# Allows IMPLICIT authentication, needed for Hive federation
5polaris.features."SUPPORTED_EXTERNAL_CATALOG_AUTHENTICATION_TYPES"=["OAUTH","IMPLICIT"]
6
7# Enables the federation feature itself
8polaris.features."ENABLE_CATALOG_FEDERATION"=true

For Kubernetes deployments, add these properties to the ConfigMap mounted into the Polaris container (typically at /deployment/config/application.properties).

Runtime requirements🔗

  • Metastore connectivity: Expose the HMS Thrift endpoint (thrift://host:port) to the Polaris deployment.
  • Configuration discovery: Iceberg’s HiveCatalog loads Hadoop/Hive client settings from the classpath. Provide hive-site.xml (and core-site.xml if needed) via HADOOP_CONF_DIR/HIVE_CONF_DIR or an image layer.
  • Authentication: Hive federation only supports IMPLICIT authentication, meaning Polaris uses the operating-system or Kerberos identity of the running process (no stored secrets). Ensure the service principal is logged in or holds a valid keytab/TGT before starting Polaris.
  • Object storage role: Configure polaris.service-identity.<realm>.aws-iam.* (or the default realm) so the server can assume the AWS role referenced by the catalog. The IAM role must allow STS access from the Polaris service identity and grant permissions to the table locations.

Kerberos setup example🔗

If your Hive Metastore enforces Kerberos, stage the necessary configuration alongside Polaris:

1export KRB5_CONFIG=/etc/polaris/krb5.conf
2export HADOOP_CONF_DIR=/etc/polaris/hadoop-conf   # contains hive-site.xml with HMS principal
3export HADOOP_OPTS="-Djava.security.auth.login.config=/etc/polaris/jaas.conf"
4kinit -kt /etc/polaris/keytabs/polaris.keytab polaris/service@EXAMPLE.COM
  • hive-site.xml must define hive.metastore.sasl.enabled=true, the metastore principal, and client principal pattern (for example hive.metastore.client.kerberos.principal=polaris/_HOST@REALM).
  • The JAAS entry (referenced by java.security.auth.login.config) should use useKeyTab=true and point to the same keytab shown above so the Polaris JVM can refresh credentials automatically.
  • Keep the keytab readable solely by the Polaris service user; the implicit authenticator consumes the TGT at startup and for periodic renewal.

Warehouse access patterns🔗

When Polaris federates to HMS, the server-side HiveCatalog must be able to reach the warehouse storage itself in order to load table metadata. This happens before Polaris can return any client-side storage credentials. In practice, Hive federation therefore works cleanly only when the Polaris server already has a working access path to the warehouse.

The current runtime behavior after building Polaris with -PNonRESTCatalogs=HIVE is best understood as follows:

PatternWorks after a -PNonRESTCatalogs=HIVE buildNotes
HDFS or warehouse access fully handled by ambient Hadoop config and process identityYesRecommended fit for the current design. Provide hive-site.xml / core-site.xml plus any required Kerberos or Hadoop client configuration.
HadoopFileIO with s3a:// warehousesOnly with extra runtime packagingA HIVE-enabled Polaris build still needs hadoop-aws and related filesystem dependencies. They are not included just by enabling the HIVE build flag.
S3FileIO with ambient AWS credentialsSometimesCan work when Polaris is explicitly configured to use S3FileIO and the process already has valid AWS credentials and region settings. This is not the general documented production path today, and non-secret endpoint or path-style settings may still be needed for S3-compatible stores.
UNSAFE! Putting object-store credentials in catalog propertiesDO NOT USE / THIS IS UNSAFEUnsafe. Secrets placed in catalog properties are returned to authenticated catalog clients through /config. This may look like a workaround in some S3-compatible deployments, but it exposes passwords, access keys, session tokens, or other secrets as plain text.

hive-site.xml and core-site.xml can be mounted via HADOOP_CONF_DIR / HIVE_CONF_DIR; they do not by themselves require rebuilding Polaris. By contrast, adding missing filesystem client libraries such as hadoop-aws does require custom packaging of the Polaris runtime or container image.

Creating a federated catalog🔗

Use the Management API (or the Python CLI) to create an external catalog whose connection type is HIVE. The following request registers a catalog that proxies to an HMS running on thrift://hms.example.internal:9083:

 1curl -X POST https://<polaris-host>/management/v1/catalogs \
 2  -H "Authorization: Bearer $TOKEN" \
 3  -H "Content-Type: application/json" \
 4  -d '{
 5        "type": "EXTERNAL",
 6        "name": "analytics_hms",
 7        "storageConfigInfo": {
 8          "storageType": "S3",
 9          "roleArn": "arn:aws:iam::123456789012:role/polaris-warehouse-access",
10          "region": "us-east-1"
11        },
12        "properties": { "default-base-location": "s3://analytics-bucket/warehouse/" },
13        "connectionConfigInfo": {
14          "connectionType": "HIVE",
15          "uri": "thrift://hms.example.internal:9083",
16          "warehouse": "s3://analytics-bucket/warehouse/",
17          "authenticationParameters": { "authenticationType": "IMPLICIT" }
18        }
19      }'

Grant catalog roles to principal roles exactly as you would for internal catalogs so engines can obtain tokens that authorize against the federated metadata.

default-base-location is required; it tells Polaris and Iceberg where to place new metadata files. allowedLocations is optional—supply it only when you want to restrict writers to a specific set of prefixes. If your IAM trust policy requires an externalId or explicit userArn, include those optional fields in storageConfigInfo. Polaris persists them and supplies them when assuming the role cited by roleArn during metadata commits.

Limitations and operational notes🔗

  • Single identity: Because only IMPLICIT authentication is permitted, Polaris cannot mix multiple Hive identities in a single deployment (HiveFederatedCatalogFactory rejects other auth types). Plan a deployment topology that aligns the Polaris process identity with the target HMS.
  • Generic tables: The Hive extension exposes Iceberg tables registered in HMS. Generic table federation is not implemented (HiveFederatedCatalogFactory#createGenericCatalog throws UnsupportedOperationException).
  • Configuration caching: Atlas-style catalog failover and multi-HMS routing are not yet handled; Polaris initializes one HiveCatalog per connection and relies on the underlying Iceberg client for retries.

With these constraints satisfied, Polaris can sit in front of an HMS so that Iceberg tables managed there gain OAuth-protected, multi-engine access through the Polaris REST APIs.