Scheduler¶
The Floe scheduler automatically triggers maintenance operations based on policy schedules.
How It Works¶
Every minute, the scheduler:
- Acquires lock - Gets distributed lock (if enabled) to prevent duplicate runs
- Lists policies - Fetches all enabled policies
- Checks schedules - For each policy, checks each operation's schedule
- Finds tables - Lists tables matching the policy pattern
- Triggers maintenance - Runs due operations on matching tables
- Records execution - Stores completion time to calculate next run
Configuration¶
# Enable/disable the scheduler
floe.scheduler.enabled=true
# Enable distributed locking for multi-replica deployments for future use
floe.scheduler.distributed-lock.enabled=true
Schedule Configuration¶
Each operation in a policy can have its own schedule:
{
"rewriteDataFilesSchedule": {
"cronExpression": "0 2 * * *",
"windowStart": "02:00",
"windowEnd": "06:00",
"allowedDays": "MONDAY,WEDNESDAY,FRIDAY",
"timeout": 4,
"enabled": true
}
}
Fields¶
| Field | Type | Description |
|---|---|---|
cronExpression |
string | Cron schedule (e.g., 0 2 * * * = 2 AM daily) |
interval |
int | Alternative: run every N days |
windowStart |
string | Earliest start time (HH:mm) |
windowEnd |
string | Latest start time (HH:mm) |
allowedDays |
string | Comma-separated days (e.g., MONDAY,FRIDAY) |
timeout |
int | Timeout in hours |
enabled |
boolean | Enable/disable this schedule |
Distributed Locking¶
For deployments with multiple Floe replicas, enable distributed locking to ensure only one replica runs the scheduler:
How It Works¶
- Before each poll cycle, scheduler attempts to acquire the
floe-schedulerlock - Uses PostgreSQL advisory locks (fast, no table overhead)
- If lock is held by another replica, poll is skipped
- Lock auto-releases on connection close (crash-safe)
Requirements¶
- PostgreSQL storage backend (
floe.store.type=POSTGRES) - Multiple replicas sharing the same database
Single-Replica Mode¶
When distributed-lock.enabled=false (default):
- Uses local
AtomicBooleanfor concurrency within JVM - Suitable for single-replica deployments
- No external dependencies
Execution Tracking¶
The scheduler tracks when each operation last ran for each table to prevent duplicate executions.
Execution Record¶
Policy: daily-compaction
Operation: REWRITE_DATA_FILES
Table: warehouse.analytics.events
Last Run: 2024-01-15T02:00:00Z
Next Run: 2024-01-16T02:00:00Z
Calculation¶
- Check if operation is due:
now >= nextRun - If cron: parse expression to find next occurrence
- If interval:
lastRun + interval - Apply maintenance window constraints
Manual Trigger¶
The scheduler handles automatic execution. To trigger maintenance manually, use the API: