Skip to content

Scheduler

The Floe scheduler automatically triggers maintenance operations based on policy schedules.

How It Works

Every minute, the scheduler:

  1. Acquires lock - Gets distributed lock (if enabled) to prevent duplicate runs
  2. Lists policies - Fetches all enabled policies
  3. Checks schedules - For each policy, checks each operation's schedule
  4. Finds tables - Lists tables matching the policy pattern
  5. Triggers maintenance - Runs due operations on matching tables
  6. Records execution - Stores completion time to calculate next run

The scheduler also applies:

  • Budget limits (tables/operations/bytes per hour)
  • Backoff after repeated failures
  • Throttling after repeated zero-change runs
  • Auto-mode prioritization based on maintenance debt score

Maintenance Debt Score

The maintenance debt score (0-1000) prioritizes tables in auto-mode when budgets are enforced. It combines:

  • Health issues: CRITICAL=100, WARNING=50, INFO=10 points each
  • Stale metadata: 0.5 points per day since last snapshot
  • Failure rate: Percentage of failed operations (0-100)
  • Consecutive failures: 20 points per failure in a row

⚠️ Note: The current weights are initial estimates for general use cases. Configurable weight parameters will be added in a future release to allow tuning based on your environment's priorities.

Example Scores

Scenario Score Calculation
Healthy table, recent data ~0 No issues
1 WARNING, 10 days old 55 50 + (10 × 0.5)
1 CRITICAL, 3 consecutive failures 160 100 + (3 × 20)
Multiple issues, 50% failure rate 250+ Cumulative

Configuration

# Enable/disable the scheduler
floe.scheduler.enabled=true

# Enable/disable condition-based triggering (default: true)
floe.scheduler.condition-based-triggering-enabled=true

# Enable distributed locking for multi-replica deployments for future use
floe.scheduler.distributed-lock-enabled=true

# Budget limits (0 = unlimited)
floe.scheduler.max-tables-per-poll=0
floe.scheduler.max-operations-per-poll=0
floe.scheduler.max-bytes-per-hour=0

# Backoff and throttling
floe.scheduler.failure-backoff-threshold=3
floe.scheduler.failure-backoff-hours=6
floe.scheduler.zero-change-threshold=5
floe.scheduler.zero-change-frequency-reduction-percent=50
floe.scheduler.zero-change-min-interval-hours=6

Schedule Configuration

Each operation in a policy can have its own schedule:

{
  "rewriteDataFilesSchedule": {
    "cronExpression": "0 2 * * *",
    "windowStart": "02:00",
    "windowEnd": "06:00",
    "allowedDays": "MONDAY,WEDNESDAY,FRIDAY",
    "timeout": 4,
    "enabled": true
  }
}

Fields

Field Type Description
cronExpression string Cron schedule (e.g., 0 2 * * * = 2 AM daily)
interval int Alternative: run every N days
windowStart string Earliest start time (HH:mm)
windowEnd string Latest start time (HH:mm)
allowedDays string Comma-separated days (e.g., MONDAY,FRIDAY)
timeout int Timeout in hours
enabled boolean Enable/disable this schedule

Distributed Locking

For deployments with multiple Floe replicas, enable distributed locking to ensure only one replica runs the scheduler:

floe.scheduler.distributed-lock-enabled=true

How It Works

  1. Before each poll cycle, scheduler attempts to acquire the floe-scheduler lock
  2. Uses PostgreSQL advisory locks (fast, no table overhead)
  3. If lock is held by another replica, poll is skipped
  4. Lock auto-releases on connection close (crash-safe)

Requirements

  • PostgreSQL storage backend (floe.store.type=POSTGRES)
  • Multiple replicas sharing the same database

Single-Replica Mode

When distributed-lock-enabled=false (default):

  • Uses local AtomicBoolean for concurrency within JVM
  • Suitable for single-replica deployments
  • No external dependencies

Execution Tracking

The scheduler tracks when each operation last ran for each table to prevent duplicate executions.

Execution Record

Policy: daily-compaction
Operation: REWRITE_DATA_FILES
Table: warehouse.analytics.events
Last Run: 2024-01-15T02:00:00Z
Next Run: 2024-01-16T02:00:00Z

Calculation

  1. Check if operation is due: now >= nextRun
  2. If cron: parse expression to find next occurrence
  3. If interval: lastRun + interval
  4. Apply maintenance window constraints

Manual Trigger

The scheduler handles automatic execution. To trigger maintenance manually, use the API:

curl -X POST http://localhost:9091/api/v1/maintenance/trigger \
  -H "Content-Type: application/json" \
  -d '{
    "catalog": "warehouse",
    "namespace": "analytics",
    "table": "events"
  }'

Trigger Status

Check whether operations would trigger for a table:

curl http://localhost:9091/api/v1/tables/{namespace}/{table}/trigger-status

Returns per-operation status including conditions met/unmet and next eligible time.