Scheduler¶

The Floe scheduler automatically triggers maintenance operations based on policy schedules.

How It Works¶

Every minute, the scheduler:

Acquires lock - Gets distributed lock (if enabled) to prevent duplicate runs
Lists policies - Fetches all enabled policies
Checks schedules - For each policy, checks each operation's schedule
Finds tables - Lists tables matching the policy pattern
Triggers maintenance - Runs due operations on matching tables
Records execution - Stores completion time to calculate next run

The scheduler also applies:

Budget limits (tables/operations/bytes per hour)
Backoff after repeated failures
Throttling after repeated zero-change runs
Auto-mode prioritization based on maintenance debt score

Maintenance Debt Score¶

The maintenance debt score (0-1000) prioritizes tables in auto-mode when budgets are enforced. It combines:

Health issues: CRITICAL=100, WARNING=50, INFO=10 points each
Stale metadata: 0.5 points per day since last snapshot
Failure rate: Percentage of failed operations (0-100)
Consecutive failures: 20 points per failure in a row

⚠️ Note: The current weights are initial estimates for general use cases. Configurable weight parameters will be added in a future release to allow tuning based on your environment's priorities.

Example Scores¶

Scenario	Score	Calculation
Healthy table, recent data	~0	No issues
1 WARNING, 10 days old	55	50 + (10 × 0.5)
1 CRITICAL, 3 consecutive failures	160	100 + (3 × 20)
Multiple issues, 50% failure rate	250+	Cumulative

Configuration¶

# Enable/disable the scheduler
floe.scheduler.enabled=true

# Enable/disable condition-based triggering (default: true)
floe.scheduler.condition-based-triggering-enabled=true

# Enable distributed locking for multi-replica deployments for future use
floe.scheduler.distributed-lock-enabled=true

# Budget limits (0 = unlimited)
floe.scheduler.max-tables-per-poll=0
floe.scheduler.max-operations-per-poll=0
floe.scheduler.max-bytes-per-hour=0

# Backoff and throttling
floe.scheduler.failure-backoff-threshold=3
floe.scheduler.failure-backoff-hours=6
floe.scheduler.zero-change-threshold=5
floe.scheduler.zero-change-frequency-reduction-percent=50
floe.scheduler.zero-change-min-interval-hours=6

Schedule Configuration¶

Each operation in a policy can have its own schedule:

{
  "rewriteDataFilesSchedule": {
    "cronExpression": "0 2 * * *",
    "windowStart": "02:00",
    "windowEnd": "06:00",
    "allowedDays": "MONDAY,WEDNESDAY,FRIDAY",
    "timeout": 4,
    "enabled": true
  }
}

Fields¶

Field	Type	Description
`cronExpression`	string	Cron schedule (e.g., `0 2 * * *` = 2 AM daily)
`interval`	int	Alternative: run every N days
`windowStart`	string	Earliest start time (HH:mm)
`windowEnd`	string	Latest start time (HH:mm)
`allowedDays`	string	Comma-separated days (e.g., `MONDAY,FRIDAY`)
`timeout`	int	Timeout in hours
`enabled`	boolean	Enable/disable this schedule

Distributed Locking¶

For deployments with multiple Floe replicas, enable distributed locking to ensure only one replica runs the scheduler:

floe.scheduler.distributed-lock-enabled=true

How It Works¶

Before each poll cycle, scheduler attempts to acquire the floe-scheduler lock
Uses PostgreSQL advisory locks (fast, no table overhead)
If lock is held by another replica, poll is skipped
Lock auto-releases on connection close (crash-safe)

Requirements¶

PostgreSQL storage backend (floe.store.type=POSTGRES)
Multiple replicas sharing the same database

Single-Replica Mode¶

When distributed-lock-enabled=false (default):

Uses local AtomicBoolean for concurrency within JVM
Suitable for single-replica deployments
No external dependencies

Execution Tracking¶

The scheduler tracks when each operation last ran for each table to prevent duplicate executions.

Execution Record¶

Policy: daily-compaction
Operation: REWRITE_DATA_FILES
Table: warehouse.analytics.events
Last Run: 2024-01-15T02:00:00Z
Next Run: 2024-01-16T02:00:00Z

Calculation¶

Check if operation is due: now >= nextRun
If cron: parse expression to find next occurrence
If interval: lastRun + interval
Apply maintenance window constraints

Manual Trigger¶

The scheduler handles automatic execution. To trigger maintenance manually, use the API:

curl -X POST http://localhost:9091/api/v1/maintenance/trigger \
  -H "Content-Type: application/json" \
  -d '{
    "catalog": "warehouse",
    "namespace": "analytics",
    "table": "events"
  }'

Trigger Status¶

Check whether operations would trigger for a table:

curl http://localhost:9091/api/v1/tables/{namespace}/{table}/trigger-status

Returns per-operation status including conditions met/unmet and next eligible time.