Skip to content

Policies

Policies define automated maintenance for Iceberg tables via pattern matching. Each policy specifies which tables to maintain, what operations to run, and when to execute them.

Policy Structure

{
  "name": "policy-name",
  "tablePattern": "catalog.namespace.*",
  "enabled": true,
  "priority": 100,

  "rewriteDataFiles": { ... },
  "rewriteDataFilesSchedule": { ... },

  "expireSnapshots": { ... },
  "expireSnapshotsSchedule": { ... },

  "orphanCleanup": { ... },
  "orphanCleanupSchedule": { ... },

  "rewriteManifests": { ... },
  "rewriteManifestsSchedule": { ... },

  "healthThresholds": { ... },
  "triggerConditions": { ... }
}

Core Fields

Field Type Required Description
name string Yes Unique policy identifier
description string No Policy description
tablePattern string Yes Pattern to match tables (e.g., catalog.db.*, /regex/, *)
enabled boolean No Enable/disable (default: true)
priority integer No Higher wins when multiple policies match
tags object No Metadata key-value pairs

Table Patterns

Pattern Matches
* All tables
catalog.*.* All tables in catalog
catalog.namespace.* All tables in namespace
catalog.namespace.table Specific table
catalog.prod_*.* Glob patterns (* = wildcard, ? = single char)
/catalog\.prod_\d+\..*/ Regex (enclosed in /)

Pattern Resolution: - Priority: Higher priority value wins - Tiebreaker: More specific pattern wins (exact > glob > wildcard) - Effective priority = priority × 1000 + specificity_score


Operations

rewriteDataFiles (Compaction)

Compacts small files. Maps to RewriteDataFiles.

Field Type Default Description
strategy string BINPACK BINPACK, SORT, or ZORDER
sortOrder string[] - Sort columns (SORT strategy)
zOrderColumns string[] - Z-order columns (ZORDER strategy)
targetFileSizeBytes long table default Target file size (e.g., 134217728 = 128 MB)
maxFileGroupSizeBytes long 100 GB Max bytes per file group
maxConcurrentFileGroupRewrites integer 5 Parallelism level
partialProgressEnabled boolean false Commit after each file group
partialProgressMaxCommits integer 10 Max partial commits
partialProgressMaxFailedCommits integer - Max failures before abort
filter string - Iceberg expression filter
rewriteJobOrder string NONE NONE, BYTES_ASC, BYTES_DESC, FILES_ASC, FILES_DESC
useStartingSequenceNumber boolean true Use starting sequence numbers
removeDanglingDeletes boolean false Remove dangling deletes
outputSpecId integer current Output partition spec ID

expireSnapshots

Removes old snapshots. Maps to ExpireSnapshots.

Field Type Default Description
retainLast integer 5 Min snapshots to keep
maxSnapshotAge string P7D Max age (ISO-8601: P7D = 7 days)
cleanExpiredMetadata boolean false Clean unused specs/schemas
expireSnapshotId long - Expire specific snapshot (Spark only)

orphanCleanup

Removes unreferenced files. Maps to DeleteOrphanFiles.

Field Type Default Description
retentionPeriodInDays integer 3 Delete files older than N days
location string table root Scan location (Spark only)
prefixMismatchMode string ERROR ERROR, IGNORE, DELETE (Spark only)
equalSchemes object - URI scheme equivalence map (Spark only)
equalAuthorities object - URI authority equivalence map (Spark only)

rewriteManifests

Optimizes manifest files. Maps to RewriteManifests.

Field Type Default Description
specId integer current Partition spec ID
stagingLocation string metadata location Staging directory
sortBy string[] spec order Sort by partition fields
rewriteIf object all Manifest filter (see below)

rewriteIf Filter Fields

Field Type Description
path string Manifest file path
length long File size in bytes
specId integer Partition spec ID
content string DATA or DELETES
sequenceNumber long Commit sequence number
minSequenceNumber long Lowest sequence number
snapshotId long Snapshot ID
addedFilesCount integer Number of added files
existingFilesCount integer Number of existing files
deletedFilesCount integer Number of deleted files
addedRowsCount long Added rows
existingRowsCount long Existing rows
deletedRowsCount long Deleted rows
partitionSummaries array Partition summaries (containsNull, containsNan, lowerBound, upperBound)

Scheduling

Each operation has independent schedule configuration.

Field Type Description
cronExpression string Cron expression (e.g., 0 2 * * *)
interval integer Interval in days (e.g., 1 = daily, 7 = weekly)
windowStart string Start time HH:mm (e.g., 09:00)
windowEnd string End time HH:mm (e.g., 17:00)
allowedDays string Comma-separated days (e.g., MONDAY,FRIDAY)
timeout integer Timeout in hours (e.g., 2 = 2 hours)
priority integer Schedule priority
enabled boolean Enable/disable

Interval vs Cron: - Use interval for periodic execution (every N hours/days) - Use cronExpression for specific times (daily at 2am, weekly on Sunday)


triggerConditions (Optional)

Gates operations based on table health. If null, operations run whenever scheduled. If set, conditions must be met (OR/AND logic).

Field Type Description
smallFilePercentageAbove double Trigger when small file % exceeds value
smallFileCountAbove long Trigger when small file count exceeds
totalFileSizeAboveBytes long Trigger when total size exceeds
deleteFileCountAbove integer Trigger when delete files exceed
deleteFileRatioAbove double Trigger when delete ratio exceeds
snapshotCountAbove integer Trigger when snapshot count exceeds
snapshotAgeAboveDays integer Trigger when oldest snapshot age exceeds
partitionCountAbove integer Trigger when partition count exceeds
partitionSkewAbove double Trigger when partition skew exceeds
manifestSizeAboveBytes long Trigger when manifest size exceeds
minIntervalMinutes integer Min time between operations (global)
perOperationMinIntervalMinutes object Per-operation intervals (map: OperationType → minutes)
criticalPipeline boolean Force execution if max delay exceeded
criticalPipelineMaxDelayMinutes integer Max delay before forcing
triggerLogic string OR (any condition) or AND (all conditions)

healthThresholds (Optional)

Custom health assessment thresholds. If null, uses system defaults.

Field Type Default Description
smallFileSizeBytes long 104857600 Size defining "small" file (100 MB)
smallFilePercentWarning double 20.0 Warning threshold %
smallFilePercentCritical double 50.0 Critical threshold %
largeFileSizeBytes long 1073741824 Size defining "large" file (1 GB)
largeFilePercentWarning double 20.0 Warning threshold %
largeFilePercentCritical double 50.0 Critical threshold %
fileCountWarning int 10000 Warning file count
fileCountCritical int 50000 Critical file count
snapshotCountWarning int 100 Warning snapshot count
snapshotCountCritical int 500 Critical snapshot count
snapshotAgeWarningDays int 7 Warning age (days)
snapshotAgeCriticalDays int 30 Critical age (days)
deleteFileCountWarning int 100 Warning delete file count
deleteFileCountCritical int 500 Critical delete file count
deleteFileRatioWarning double 0.10 Warning ratio (deletes/data)
deleteFileRatioCritical double 0.25 Critical ratio
manifestCountWarning int 100 Warning manifest count
manifestCountCritical int 500 Critical manifest count
manifestSizeWarningBytes long 104857600 Warning size (100 MB)
manifestSizeCriticalBytes long 524288000 Critical size (500 MB)
partitionCountWarning int 5000 Warning partition count
partitionCountCritical int 10000 Critical partition count
partitionSkewWarning double 3.0 Warning skew (max/avg)
partitionSkewCritical double 10.0 Critical skew
staleMetadataWarningDays int 7 Warning staleness (days)
staleMetadataCriticalDays int 30 Critical staleness (days)

Examples

Basic Compaction

{
  "name": "events-compaction",
  "tablePattern": "prod.events.*",
  "priority": 100,
  "rewriteDataFiles": {
    "strategy": "BINPACK",
    "targetFileSizeBytes": 134217728
  },
  "rewriteDataFilesSchedule": {
    "interval": "PT6H",
    "enabled": true
  }
}

Snapshot Expiration with Triggers

{
  "name": "snapshot-cleanup",
  "tablePattern": "prod.*.*",
  "priority": 50,
  "expireSnapshots": {
    "retainLast": 10,
    "maxSnapshotAge": "P7D"
  },
  "expireSnapshotsSchedule": {
    "cronExpression": "0 2 * * *",
    "enabled": true
  },
  "triggerConditions": {
    "snapshotCountAbove": 50,
    "minIntervalMinutes": 1440
  }
}

Multi-Operation with Time Windows

{
  "name": "full-maintenance",
  "tablePattern": "prod.analytics.*",
  "priority": 100,
  "rewriteDataFiles": {
    "strategy": "BINPACK",
    "targetFileSizeBytes": 268435456,
    "maxConcurrentFileGroupRewrites": 10
  },
  "rewriteDataFilesSchedule": {
    "interval": "PT2H",
    "windowStart": "09:00",
    "windowEnd": "17:00",
    "allowedDays": "MONDAY,TUESDAY,WEDNESDAY,THURSDAY,FRIDAY",
    "enabled": true
  },
  "expireSnapshots": {
    "retainLast": 10,
    "maxSnapshotAge": "P7D"
  },
  "expireSnapshotsSchedule": {
    "cronExpression": "0 2 * * SUN",
    "enabled": true
  },
  "orphanCleanup": {
    "retentionPeriodInDays": 3
  },
  "orphanCleanupSchedule": {
    "interval": "P7D",
    "enabled": true
  }
}

Critical Pipeline (Guaranteed Execution)

{
  "name": "critical-orders",
  "tablePattern": "prod.orders.*",
  "priority": 200,
  "rewriteDataFiles": {
    "strategy": "BINPACK"
  },
  "rewriteDataFilesSchedule": {
    "interval": "PT1H",
    "enabled": true
  },
  "triggerConditions": {
    "smallFilePercentageAbove": 30,
    "criticalPipeline": true,
    "criticalPipelineMaxDelayMinutes": 360
  }
}

Engine Compatibility

Feature Spark Trino
rewriteDataFiles Full Partial (targetFileSizeBytes, filter only)
expireSnapshots Full Partial (no expireSnapshotId)
orphanCleanup Full Basic only (no location, prefixMismatchMode, etc.)
rewriteManifests Full No parameters

See Also