DataHub¶
The Iceberg REST Catalog portion of the Datahub project.
Note: DataHub support is early. No example setup is provided at this time.
Configuration¶
FLOE_CATALOG_TYPE=DATAHUB
FLOE_CATALOG_NAME=demo
FLOE_CATALOG_DATAHUB_URI=http://datahub-gms:8080/iceberg/
FLOE_CATALOG_DATAHUB_TOKEN=<your-personal-access-token>
FLOE_CATALOG_WAREHOUSE=s3://warehouse/
# S3 storage
FLOE_CATALOG_S3_ENDPOINT=http://seaweedfs:8333
FLOE_CATALOG_S3_ACCESS_KEY_ID=admin
FLOE_CATALOG_S3_SECRET_ACCESS_KEY=password
FLOE_CATALOG_S3_REGION=us-east-1
Options¶
| Variable | Required | Description |
|---|---|---|
FLOE_CATALOG_DATAHUB_URI |
Yes | DataHub GMS Iceberg endpoint (e.g., http://datahub-gms:8080/iceberg/) |
FLOE_CATALOG_DATAHUB_TOKEN |
Yes | DataHub Personal Access Token |
FLOE_CATALOG_WAREHOUSE |
Yes | Warehouse location (e.g., s3://warehouse/) |
FLOE_CATALOG_S3_ENDPOINT |
Yes* | S3 endpoint (*required for custom S3-compatible storage) |
FLOE_CATALOG_S3_ACCESS_KEY_ID |
Yes | S3 access key |
FLOE_CATALOG_S3_SECRET_ACCESS_KEY |
Yes | S3 secret key |
FLOE_CATALOG_S3_REGION |
No | S3 region (default: us-east-1) |
Authentication¶
DataHub requires a Personal Access Token (PAT):
- Open DataHub UI
- Go to Settings > Access Tokens
- Generate a new token
- Set
FLOE_CATALOG_DATAHUB_TOKEN
Engine Compatibility¶
Both Spark and Trino support DataHub via the standard Iceberg REST catalog protocol.