Apache Spark Engine¶

Apache Spark for Iceberg maintenance operations, submitted via Apache Livy REST API.

Operation Support¶

Spark has full support for all Iceberg maintenance operations across all supported catalog types (REST, Hive, Nessie, Polaris, Lakekeeper, Gravitino).

Spark's SparkActions API provides the most comprehensive maintenance capabilities. Floe submits Spark jobs through Apache Livy, which provides a REST interface to Spark clusters.

Configuration¶

Property notation:

floe.engine-type=SPARK
floe.livy.url=http://livy:8998

Environment variable notation:

FLOE_ENGINE_TYPE=SPARK
FLOE_LIVY_URL=http://livy:8998

Options¶

Property	Environment Variable	Required	Default	Description
`floe.livy.url`	`FLOE_LIVY_URL`	Yes	-	Livy REST API URL
`floe.livy.job.jar`	`FLOE_LIVY_JOB_JAR`	No	`local:///opt/floe/floe-maintenance-job.jar`	Path to maintenance job JAR
`floe.livy.job.main-class`	`FLOE_LIVY_JOB_MAIN_CLASS`	No	`com.floe.spark.job.MaintenanceJob`	Main class
`floe.livy.driver-memory`	`FLOE_LIVY_DRIVER_MEMORY`	No	`2g`	Spark driver memory
`floe.livy.executor-memory`	`FLOE_LIVY_EXECUTOR_MEMORY`	No	`2g`	Spark executor memory

Job Lifecycle¶

Floe submits batch job to Livy
Livy creates Spark application
Job executes maintenance operation
Floe polls for completion
Results returned to Floe

Monitoring¶

Livy UI¶

Access at http://livy:8998/ui to see:

Active sessions
Batch job history
Logs