Build a Local Open Data Lakehouse with k3d, Apache Ozone, Apache Polaris and Trino

TL;DR β€” Spin up a fully integrated, locally-running open data lakehouse on your laptop in under 30 minutes using Kubernetes in Docker (k3d), Apache Ozone as S3-compatible object storage, Apache Polaris as the Iceberg REST catalog and Trino as the SQL query engine. No cloud account required.


Why This Stack?πŸ”—

The modern open data lakehouse is built on open standards: Apache Iceberg as the table format, a REST catalog to manage metadata, object storage for the actual files and a decoupled compute engine for queries. This separation lets you swap any layer without rewriting the others.

But spinning up a realistic multi-component stack locally has historically meant juggling docker-compose files, manual wiring and frustrating networking issues. Helm + k3d changes that. You get a real Kubernetes environment (with proper service discovery, namespaces and resource management) running entirely inside Docker on your laptop.

Here’s what each tool does in our stack:

ToolRoleWhy
k3dLocal Kubernetes cluster inside DockerLightweight, fast to create/destroy, great for dev/test
Apache OzoneS3-compatible distributed object storeStores the actual Iceberg data and metadata files
Apache PolarisIceberg REST catalog (Apache Top-Level Project)Manages table metadata; any Iceberg-compatible engine can use it
TrinoDistributed SQL query engineReads Iceberg tables via Polaris, files from Ozone

The data flow looks like this:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  1. catalog ops            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Trino  β”‚ ────────────────────────▢  β”‚ Apache Polaris β”‚
β”‚ (Query) β”‚ ◀────────────────────────  β”‚   (Catalog)    β”‚
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  2. metadata location      β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
     β”‚                                         β”‚ 3. write metadata
     β”‚ 4. read/write data files (S3 API)       β”‚    JSON to Ozone
     β–Ό                                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 Apache Ozone (S3 Gateway)              β”‚
β”‚                    (Object Storage)                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

When you run a query:

  1. Trino calls Polaris (via the Iceberg REST API) to get table metadata β€” schema, snapshot, and the location of data files in Ozone
  2. Polaris also handles commit orchestration: when Trino creates or writes a table, Polaris writes the Iceberg metadata JSON files directly to Ozone
  3. Trino reads and writes the actual Parquet data files directly to Ozone using static S3 credentials configured in its Helm values

Why does Trino go directly to Ozone instead of through Polaris? In a production cloud setup, Polaris would use AWS STS to vend short-lived, scoped credentials to Trino for each table access (credential vending). Trino would then use those temporary credentials to hit S3. However, Ozone has no STS endpoint currently, so credential vending doesn’t work here. Instead, we configure stsUnavailable: true on the catalog and give Trino static Ozone credentials directly in trino-values.yaml. The architecture is otherwise identical to a production deployment.


PrerequisitesπŸ”—

Before starting, make sure you have the following installed:

  • Docker Engine (e.g., via Docker Desktop or Colima) β€” k3d runs Kubernetes inside Docker
  • k3d β‰₯ v5.x β€” brew install k3d / k3d.io
  • kubectl β€” brew install kubectl
  • Helm β‰₯ v3.x β€” brew install helm
  • curl + jq β€” for Polaris REST API calls
  • AWS CLI (aws) β€” for verifying Ozone S3 connectivity
    1brew install awscli
    

Minimum hardware: 8 GB RAM and 4 CPU cores recommended. Ozone is resource-hungry.


Step 1 β€” Create a k3d ClusterπŸ”—

We create a k3d cluster with Traefik (k3d’s built-in ingress controller) enabled and map host port 8080 to the cluster’s load balancer port 80. This lets us use clean host-based routing without touching /etc/hosts, thanks to nip.io β€” a free public wildcard DNS that resolves anything like *.127.0.0.1.nip.io to 127.0.0.1 automatically, with zero configuration.

Offline / no internet? nip.io requires a DNS lookup. If you’re working offline, add this to /etc/hosts instead and everything will work identically:

1sudo tee -a /etc/hosts <<EOF
2127.0.0.1 polaris.127.0.0.1.nip.io trino.127.0.0.1.nip.io
3EOF

Our service URLs will be:

ServiceURL
Apache Polarishttp://polaris.127.0.0.1.nip.io:8080
Trino Web UIhttp://trino.127.0.0.1.nip.io:8080
1k3d cluster create lakehouse \
2  --servers 1 \
3  --agents 2 \
4  -p "8080:80@loadbalancer"

Verify the cluster is up:

1kubectl cluster-info
2kubectl get nodes

Step 2 β€” Install Apache OzoneπŸ”—

Apache Ozone is the storage foundation of our stack. It provides an S3-compatible API (via its S3 Gateway service) that both Polaris and Trino will use to read/write Iceberg table files.

Add the Helm repo and installπŸ”—

1helm repo add ozone https://apache.github.io/ozone-helm-charts/
2helm repo update

For local development, we use a minimal values.yaml to reduce resource usage and expose the S3 Gateway as a ClusterIP service (we’ll use kubectl port-forward to access it locally):

 1# ozone-values.yaml
 2scm:
 3  replicaCount: 1
 4
 5om:
 6  replicaCount: 1
 7
 8datanode:
 9  replicaCount: 3   # minimum for block placement
10
11s3g:
12  replicaCount: 1
13
14# Disable TLS for local dev
15tls:
16  enabled: false
1helm install ozone ozone/ozone \
2  --namespace ozone \
3  --create-namespace \
4  --values ozone-values.yaml \
5  --wait --timeout 5m

Watch the pods come up:

1kubectl get pods -n ozone -w

You should see pods for scm (Storage Container Manager), om (Ozone Manager), datanode-0/1/2, and s3g.

Create a bucket via the S3 APIπŸ”—

We create the bucket using the AWS CLI against the Ozone S3 Gateway. This ensures the bucket is owned by testuser β€” the same credentials Polaris and Trino will use β€” so path resolution is guaranteed to work.

Make sure the port-forward is running in a separate terminal:

1kubectl port-forward -n ozone svc/ozone-s3g-rest 9878:9878

Then create the bucket:

1AWS_ACCESS_KEY_ID=testuser AWS_SECRET_ACCESS_KEY=testpassword \
2  aws s3 mb s3://warehouse --endpoint-url http://localhost:9878

Verify it was created:

1AWS_ACCESS_KEY_ID=testuser AWS_SECRET_ACCESS_KEY=testpassword \
2  aws s3 ls --endpoint-url http://localhost:9878

You should see:

YYYY-MM-DD HH:MM:SS warehouse

S3 credentials in non-secure modeπŸ”—

Because we’re running Ozone without security (ozone.security.enabled=false β€” the default for the Helm chart in local dev), the S3 Gateway accepts any access key and secret key. There is no credential validation.

We’ll use these placeholder values consistently in both Polaris and Trino:

OZONE_ACCESS_KEY=testuser
OZONE_SECRET_KEY=testpassword

For production: enable Ozone security and use ozone s3 getsecret -u <username> to generate real per-user credentials backed by Kerberos.


Step 3 β€” Install Apache PolarisπŸ”—

Apache Polaris is an open-source Iceberg REST catalog and an Apache Top-Level Project. It stores and serves Iceberg table metadata and acts as the single source of truth for schema, partitioning, and snapshot history. Trino (and any other Iceberg engine) talks to Polaris using the standard Iceberg REST API.

Add the Helm repo and installπŸ”—

1helm repo add polaris https://downloads.apache.org/polaris/helm-chart
2helm repo update

For local dev we override a few key values:

 1# polaris-values.yaml
 2
 3# Use in-memory persistence (good enough for local dev; loses state on pod restart)
 4persistence:
 5  type: in-memory
 6
 7extraEnv:
 8  - name: POLARIS_BOOTSTRAP_CREDENTIALS
 9    value: "POLARIS,root,polaris-secret"
10  - name: AWS_ACCESS_KEY_ID
11    valueFrom:
12      secretKeyRef:
13        name: polaris-ozone-secret
14        key: access-key
15  - name: AWS_SECRET_ACCESS_KEY
16    valueFrom:
17      secretKeyRef:
18        name: polaris-ozone-secret
19        key: secret-key
20  - name: AWS_REGION
21    value: "us-east-1"

How credentials work: POLARIS_BOOTSTRAP_CREDENTIALS sets the root principal on first boot (format: realm,clientId,clientSecret). The AWS_* env vars give Polaris the static S3 credentials it uses when writing Iceberg metadata files to Ozone.

We use stsUnavailable: true in the catalog’s storageConfigInfo (see the next step) to tell Polaris that STS is not available and to use the static credentials directly β€” while still propagating the custom S3 endpoint and path-style settings to the FileIO client.

Create the secret before installing Polaris:

1kubectl create namespace polaris --dry-run=client -o yaml | kubectl apply -f -
2kubectl create secret generic polaris-ozone-secret \
3  --namespace polaris \
4  --from-literal=access-key=testuser \
5  --from-literal=secret-key=testpassword
1helm upgrade --install polaris polaris/polaris \
2  --namespace polaris \
3  --create-namespace \
4  --values polaris-values.yaml \
5  --version 1.3.0-incubating \
6  --wait --timeout 3m

Note: The Apache Polaris project graduated from the Incubator in February 2026, but the Helm chart hasn’t been republished under a non-incubating version yet. Helm skips pre-release versions by default, so --version 1.3.0-incubating is required for now. Once a post-graduation chart is released, the version string will drop the -incubating suffix (e.g. --version 1.4.0), or you can omit --version entirely to get the latest.

Verify:

1kubectl get pods -n polaris

Apply an Ingress so Polaris is reachable at http://polaris.127.0.0.1.nip.io:8080:

 1# polaris-ingress.yaml
 2apiVersion: networking.k8s.io/v1
 3kind: Ingress
 4metadata:
 5  name: polaris
 6  namespace: polaris
 7  annotations:
 8    traefik.ingress.kubernetes.io/router.entrypoints: web
 9spec:
10  rules:
11    - host: polaris.127.0.0.1.nip.io
12      http:
13        paths:
14          - path: /
15            pathType: Prefix
16            backend:
17              service:
18                name: polaris
19                port:
20                  number: 8181
1kubectl apply -f polaris-ingress.yaml

Verify it’s up:

1curl http://polaris.127.0.0.1.nip.io:8080/api/catalog/v1/config

Configure Polaris via the REST APIπŸ”—

Polaris manages everything through its REST API. We need to:

  1. Get an access token using the root credentials
  2. Create a principal (service account for Trino)
  3. Create a catalog (backed by Ozone storage)
  4. Create a namespace inside the catalog

1. Get an access tokenπŸ”—

1TOKEN=$(curl -s -o /tmp/token.json -w "%{http_code}" \
2  -X POST http://polaris.127.0.0.1.nip.io:8080/api/catalog/v1/oauth/tokens \
3  -H "Content-Type: application/x-www-form-urlencoded" \
4  -d "grant_type=client_credentials&client_id=root&client_secret=polaris-secret&scope=PRINCIPAL_ROLE:ALL")
5
6echo "HTTP $TOKEN"
7TOKEN=$(jq -r '.access_token' /tmp/token.json)
8echo "Token: $TOKEN"

2. Create a Trino principal and credentialsπŸ”—

Polaris returns credentials only once at creation time β€” capture them immediately:

 1CREDS=$(curl -s -X POST http://polaris.127.0.0.1.nip.io:8080/api/management/v1/principals \
 2  -H "Authorization: Bearer $TOKEN" \
 3  -H "Content-Type: application/json" \
 4  -o /tmp/principal.json -w "%{http_code}" \
 5  -d '{
 6    "name": "trino-principal",
 7    "type": "SERVICE"
 8  }')
 9
10echo "HTTP $CREDS"
11# $CREDS holds only the status code; the JSON body is in /tmp/principal.json
12TRINO_CLIENT_ID=$(jq -r '.credentials.clientId' /tmp/principal.json)
13TRINO_CLIENT_SECRET=$(jq -r '.credentials.clientSecret' /tmp/principal.json)
14
15echo "Trino Client ID:     $TRINO_CLIENT_ID"
16echo "Trino Client Secret: $TRINO_CLIENT_SECRET"

If you see HTTP 409, the principal already exists (e.g. from a previous attempt) and the secret cannot be retrieved again. Delete it and recreate:

1curl -s -X DELETE http://polaris.127.0.0.1.nip.io:8080/api/management/v1/principals/trino-principal \
2  -H "Authorization: Bearer $TOKEN" \
3  -w "\nHTTP %{http_code}"

Then re-run the block above.

Save these β€” you’ll use them in the Trino Helm values.

3. Create a catalog backed by OzoneπŸ”—

We create an internal Polaris catalog and configure its default storage to use our Ozone S3 bucket.

 1curl -X POST http://polaris.127.0.0.1.nip.io:8080/api/management/v1/catalogs \
 2  -H "Authorization: Bearer $TOKEN" \
 3  -H "Content-Type: application/json" \
 4  -w "\nHTTP %{http_code}" \
 5  -d '{
 6    "name": "ozone_catalog",
 7    "type": "INTERNAL",
 8    "properties": {
 9      "default-base-location": "s3://warehouse/iceberg"
10    },
11    "storageConfigInfo": {
12      "storageType": "S3",
13      "allowedLocations": ["s3://warehouse/"],
14      "endpoint": "http://ozone-s3g-rest.ozone.svc.cluster.local:9878",
15      "endpointInternal": "http://ozone-s3g-rest.ozone.svc.cluster.local:9878",
16      "stsUnavailable": true,
17      "pathStyleAccess": true
18    }
19  }'

Note: Use s3:// (not s3a://) in Polaris catalog config. Trino uses s3a:// when reading/writing files, but Polaris stores and validates locations using s3:// internally. storageType: S3 is the correct type for any S3-compatible storage including Ozone

The critical fields in storageConfigInfo are:

  • stsUnavailable: true β€” tells Polaris not to call the AWS STS service for temporary credentials (Ozone has no STS endpoint). Polaris will use the AWS_* environment credentials directly instead.
  • endpoint / endpointInternal β€” the S3-compatible endpoint for Ozone, injected into the StorageAccessConfig.extraProperties passed to S3FileIO so file writes go to Ozone.
  • pathStyleAccess: true β€” forces path-style requests (host/bucket/key) instead of virtual-hosted style (bucket.host/key), which Ozone requires.

4. Grant the principal access to the catalogπŸ”—

Polaris uses a three-tier RBAC model: principal β†’ principal role β†’ catalog role β†’ privileges. We need to wire all of these together:

 1# 4a. Create a principal role
 2curl -X POST "http://polaris.127.0.0.1.nip.io:8080/api/management/v1/principal-roles" \
 3  -H "Authorization: Bearer $TOKEN" \
 4  -H "Content-Type: application/json" \
 5  -w "\nHTTP %{http_code}" \
 6  -d '{"principalRole": {"name": "trino-role"}}'
 7
 8# 4b. Create a catalog role inside ozone_catalog
 9curl -X POST "http://polaris.127.0.0.1.nip.io:8080/api/management/v1/catalogs/ozone_catalog/catalog-roles" \
10  -H "Authorization: Bearer $TOKEN" \
11  -H "Content-Type: application/json" \
12  -w "\nHTTP %{http_code}" \
13  -d '{"catalogRole": {"name": "trino-catalog-role"}}'
14
15# 4c. Grant CATALOG_MANAGE_CONTENT privilege to the catalog role
16curl -X PUT "http://polaris.127.0.0.1.nip.io:8080/api/management/v1/catalogs/ozone_catalog/catalog-roles/trino-catalog-role/grants" \
17  -H "Authorization: Bearer $TOKEN" \
18  -H "Content-Type: application/json" \
19  -w "\nHTTP %{http_code}" \
20  -d '{"grant": {"type": "catalog", "privilege": "CATALOG_MANAGE_CONTENT"}}'
21
22# 4d. Assign the catalog role to the principal role
23curl -X PUT "http://polaris.127.0.0.1.nip.io:8080/api/management/v1/principal-roles/trino-role/catalog-roles/ozone_catalog" \
24  -H "Authorization: Bearer $TOKEN" \
25  -H "Content-Type: application/json" \
26  -w "\nHTTP %{http_code}" \
27  -d '{"catalogRole": {"name": "trino-catalog-role"}}'
28
29# 4e. Assign the principal role to the trino principal
30curl -X PUT "http://polaris.127.0.0.1.nip.io:8080/api/management/v1/principals/trino-principal/principal-roles" \
31  -H "Authorization: Bearer $TOKEN" \
32  -H "Content-Type: application/json" \
33  -w "\nHTTP %{http_code}" \
34  -d '{"principalRole": {"name": "trino-role"}}'

Each command should return HTTP 201. A 409 on steps 4a or 4b means the role already exists (e.g. from a previous attempt) β€” that’s fine, just continue to the next step.

5. Create a namespaceπŸ”—

 1curl -X POST "http://polaris.127.0.0.1.nip.io:8080/api/catalog/v1/ozone_catalog/namespaces" \
 2  -H "Authorization: Bearer $TOKEN" \
 3  -H "Content-Type: application/json" \
 4  -w "\nHTTP %{http_code}" \
 5  -d '{
 6    "namespace": ["demo"],
 7    "properties": {
 8      "location": "s3://warehouse/iceberg/demo"
 9    }
10  }'

Step 4 β€” Install TrinoπŸ”—

Trino is our SQL query engine. We configure it with two things:

  1. An Iceberg connector catalog that points to Polaris as the REST catalog
  2. S3 file system settings pointing to Ozone

Prepare the Helm valuesπŸ”—

 1# trino-values.yaml
 2
 3image:
 4  tag: "480"   # latest stable version at time of writing
 5
 6server:
 7  workers: 1   # single worker is enough for local dev
 8
 9# Allow Traefik's X-Forwarded-For headers (required when running behind an ingress proxy)
10additionalConfigProperties:
11  - http-server.process-forwarded=true
12
13coordinator:
14  resources:
15    requests:
16      memory: "1Gi"
17      cpu: "500m"
18    limits:
19      memory: "2Gi"
20      cpu: "1"
21
22worker:
23  resources:
24    requests:
25      memory: "2Gi"
26      cpu: "500m"
27    limits:
28      memory: "4Gi"
29      cpu: "1"
30
31service:
32  type: ClusterIP
33  port: 8080
34
35# Define the Iceberg catalog backed by Polaris
36additionalCatalogs:
37  lakehouse: |
38    connector.name=iceberg
39    iceberg.catalog.type=rest
40    iceberg.rest-catalog.uri=http://polaris.polaris.svc.cluster.local:8181/api/catalog
41    iceberg.rest-catalog.warehouse=ozone_catalog
42    iceberg.rest-catalog.security=OAUTH2
43    # For local dev: credential shorthand (clientId:clientSecret)
44    iceberg.rest-catalog.oauth2.credential=<TRINO_CLIENT_ID>:<TRINO_CLIENT_SECRET>
45    iceberg.rest-catalog.oauth2.server-uri=http://polaris.polaris.svc.cluster.local:8181/api/catalog/v1/oauth/tokens
46    iceberg.rest-catalog.oauth2.scope=PRINCIPAL_ROLE:ALL
47    fs.native-s3.enabled=true
48    s3.endpoint=http://ozone-s3g-rest.ozone.svc.cluster.local:9878
49    s3.path-style-access=true
50    s3.aws-access-key=testuser
51    s3.aws-secret-key=testpassword
52    s3.region=us-east-1

Replace the placeholders:

  • <TRINO_CLIENT_ID> / <TRINO_CLIENT_SECRET> β€” from Step 3 (Polaris principal credentials)

Add the repo and installπŸ”—

1helm repo add trino https://trinodb.github.io/charts/
2helm repo update
3
4helm install trino trino/trino \
5  --namespace trino \
6  --create-namespace \
7  --values trino-values.yaml \
8  --wait --timeout 3m

Verify all pods are running:

1kubectl get pods -n trino

You should see a trino-coordinator-* and trino-worker-* pod.

Apply an Ingress so the Trino UI is reachable at http://trino.127.0.0.1.nip.io:8080:

 1# trino-ingress.yaml
 2apiVersion: networking.k8s.io/v1
 3kind: Ingress
 4metadata:
 5  name: trino
 6  namespace: trino
 7  annotations:
 8    traefik.ingress.kubernetes.io/router.entrypoints: web
 9spec:
10  rules:
11    - host: trino.127.0.0.1.nip.io
12      http:
13        paths:
14          - path: /
15            pathType: Prefix
16            backend:
17              service:
18                name: trino
19                port:
20                  number: 8080
1kubectl apply -f trino-ingress.yaml

Open http://trino.127.0.0.1.nip.io:8080 in your browser to see the Trino Web UI.


Step 5 β€” End-to-End TestπŸ”—

Time to put it all together. We’ll exec into the Trino coordinator pod and use the built-in Trino CLI to create an Iceberg table, insert data, and query it back.

1TRINO_POD=$(kubectl get pod -n trino \
2  -l app.kubernetes.io/name=trino,app.kubernetes.io/component=coordinator \
3  -o jsonpath='{.items[0].metadata.name}')
4
5kubectl exec -n trino -it $TRINO_POD -- trino \
6  --server http://localhost:8080 \
7  --catalog lakehouse \
8  --schema demo

Inside the Trino CLI:

 1-- Create an Iceberg table in the 'demo' namespace
 2CREATE TABLE lakehouse.demo.events (
 3    event_id   BIGINT,
 4    event_type VARCHAR,
 5    user_id    BIGINT,
 6    created_at TIMESTAMP(6) WITH TIME ZONE
 7)
 8WITH (
 9    format = 'PARQUET',
10    partitioning = ARRAY['day(created_at)']
11);
12
13-- Insert some rows
14INSERT INTO lakehouse.demo.events VALUES
15    (1, 'page_view',  101, TIMESTAMP '2024-10-01 10:00:00 UTC'),
16    (2, 'click',      102, TIMESTAMP '2024-10-01 11:30:00 UTC'),
17    (3, 'purchase',   101, TIMESTAMP '2024-10-02 09:15:00 UTC');
18
19-- Query the data
20SELECT event_type, COUNT(*) AS cnt
21FROM lakehouse.demo.events
22GROUP BY event_type
23ORDER BY cnt DESC;

Expected output:

 event_type | cnt
------------+-----
 page_view  |   1
 click      |   1
 purchase   |   1

You can also verify the files are physically present in Ozone:

1kubectl exec -n ozone ozone-om-0 -- \
2  ozone sh key list /s3v/warehouse

You’ll see .parquet data files and a metadata/ directory with Iceberg JSON metadata β€” exactly what you’d see in S3 with a real cloud deployment.


Tear DownπŸ”—

When you’re done:

1k3d cluster delete lakehouse

This destroys everything β€” the Kubernetes cluster, all Helm releases, and all data. Since we used in-memory persistence for Polaris and ephemeral storage for Ozone, nothing leaks onto your filesystem.


SummaryπŸ”—

In this tutorial we built a complete open data lakehouse locally using:

  1. k3d to create a throwaway Kubernetes cluster in Docker
  2. Apache Ozone as an S3-compatible object store (installed via Helm)
  3. Apache Polaris as the Iceberg REST catalog (installed via Helm)
  4. Trino as the SQL query engine (installed via Helm, configured to use Polaris + Ozone)

The entire stack runs on open standards (Iceberg REST API, S3 API) which means you can swap any layer for a compatible alternative without changing the others. That portability is the real value of this architecture.