Hive Metastore Federation

Polaris can federate catalog operations to an existing Hive Metastore (HMS). This lets an external HMS remain the source of truth for table metadata while Polaris brokers access, policies, and multi-engine connectivity.

Build-time enablement

The Hive factory is packaged as an optional extension and is not baked into default server builds. Include it when assembling the runtime or container images by setting the NonRESTCatalogs Gradle property to include HIVE (and any other non-REST backends you need):

./gradlew :polaris-server:assemble :polaris-server:quarkusAppPartsBuild --rerun \
  -DNonRESTCatalogs=HIVE -Dquarkus.container-image.build=true

runtime/server/build.gradle.kts wires the extension in only when this flag is present, so binaries built without it will reject Hive federation requests.

Runtime requirements

  • Metastore connectivity: Expose the HMS Thrift endpoint (thrift://host:port) to the Polaris deployment.
  • Configuration discovery: Iceberg’s HiveCatalog loads Hadoop/Hive client settings from the classpath. Provide hive-site.xml (and core-site.xml if needed) via HADOOP_CONF_DIR/HIVE_CONF_DIR or an image layer.
  • Authentication: Hive federation only supports IMPLICIT authentication, meaning Polaris uses the operating-system or Kerberos identity of the running process (no stored secrets). Ensure the service principal is logged in or holds a valid keytab/TGT before starting Polaris.
  • Object storage role: Configure polaris.service-identity.<realm>.aws-iam.* (or the default realm) so the server can assume the AWS role referenced by the catalog. The IAM role must allow STS access from the Polaris service identity and grant permissions to the table locations.

Kerberos setup example

If your Hive Metastore enforces Kerberos, stage the necessary configuration alongside Polaris:

export KRB5_CONFIG=/etc/polaris/krb5.conf
export HADOOP_CONF_DIR=/etc/polaris/hadoop-conf   # contains hive-site.xml with HMS principal
export HADOOP_OPTS="-Djava.security.auth.login.config=/etc/polaris/jaas.conf"
kinit -kt /etc/polaris/keytabs/polaris.keytab polaris/service@EXAMPLE.COM
  • hive-site.xml must define hive.metastore.sasl.enabled=true, the metastore principal, and client principal pattern (for example hive.metastore.client.kerberos.principal=polaris/_HOST@REALM).
  • The JAAS entry (referenced by java.security.auth.login.config) should use useKeyTab=true and point to the same keytab shown above so the Polaris JVM can refresh credentials automatically.
  • Keep the keytab readable solely by the Polaris service user; the implicit authenticator consumes the TGT at startup and for periodic renewal.

Creating a federated catalog

Use the Management API (or the Python CLI) to create an external catalog whose connection type is HIVE. The following request registers a catalog that proxies to an HMS running on thrift://hms.example.internal:9083:

curl -X POST https://<polaris-host>/management/v1/catalogs \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
        "type": "EXTERNAL",
        "name": "analytics_hms",
        "storageConfigInfo": {
          "storageType": "S3",
          "roleArn": "arn:aws:iam::123456789012:role/polaris-warehouse-access",
          "region": "us-east-1"
        },
        "properties": { "default-base-location": "s3://analytics-bucket/warehouse/" },
        "connectionConfigInfo": {
          "connectionType": "HIVE",
          "uri": "thrift://hms.example.internal:9083",
          "warehouse": "s3://analytics-bucket/warehouse/",
          "authenticationParameters": { "authenticationType": "IMPLICIT" }
        }
      }'

Grant catalog roles to principal roles exactly as you would for internal catalogs so engines can obtain tokens that authorize against the federated metadata.

default-base-location is required; it tells Polaris and Iceberg where to place new metadata files. allowedLocations is optional—supply it only when you want to restrict writers to a specific set of prefixes. If your IAM trust policy requires an externalId or explicit userArn, include those optional fields in storageConfigInfo. Polaris persists them and supplies them when assuming the role cited by roleArn during metadata commits.

Limitations and operational notes

  • Single identity: Because only IMPLICIT authentication is permitted, Polaris cannot mix multiple Hive identities in a single deployment (HiveFederatedCatalogFactory rejects other auth types). Plan a deployment topology that aligns the Polaris process identity with the target HMS.
  • Generic tables: The Hive extension exposes Iceberg tables registered in HMS. Generic table federation is not implemented (HiveFederatedCatalogFactory#createGenericCatalog throws UnsupportedOperationException).
  • Configuration caching: Atlas-style catalog failover and multi-HMS routing are not yet handled; Polaris initializes one HiveCatalog per connection and relies on the underlying Iceberg client for retries.

With these constraints satisfied, Polaris can sit in front of an HMS so that Iceberg tables managed there gain OAuth-protected, multi-engine access through the Polaris REST APIs.