// the find
kubeflow/spark-operator
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
A Kubernetes operator that lets you declare Spark jobs as CRDs (`SparkApplication`, `ScheduledSparkApplication`) and have the operator handle submission, retries, and lifecycle. It sits under the Kubeflow umbrella and is the de facto standard approach for running Spark on Kubernetes without wrapping `spark-submit` in shell scripts. Aimed at data engineering teams already committed to K8s who want Spark workloads to behave like native K8s workloads.
Native cron scheduling baked into the CRD spec — no separate CronJob wrapping a Job wrapping spark-submit. The mutating admission webhook lets you attach volumes, set affinity, and do things Spark's own pod template support can't express. Prometheus metrics export is first-class, not an afterthought. Active maintenance — v2.3.x targeting Spark 4.0 shipped recently, CI is green, and the ADOPTERS.md list includes real companies.
Still `v1beta2` API after years of production use, which is a yellow flag for anything you'd call stable. The Python API package in `api/python_api/` is just generated Kubernetes model stubs — there's no actual Python SDK for submitting or managing SparkApplications programmatically. RBAC setup is tedious and easy to misconfigure: you need the operator's service account permissions plus a separate spark RBAC setup per namespace where jobs run, and the docs don't make this obvious upfront. No built-in gang scheduling support — if your executors can't all fit on available nodes simultaneously, you get partial allocation deadlocks and need to integrate something like Volcano or Yunikorn yourself.