// the find
databricks/mlops-stacks
This repo provides a customizable stack for starting new ML projects on Databricks that follow production best-practices out of the box.
A Databricks-native project scaffold that generates the full MLOps skeleton — CI/CD pipelines, MLflow job definitions, feature store wiring, and model validation — from a single `databricks bundle init` command. Aimed at ML teams on Databricks who want to skip the boilerplate and land directly on a structure that can go to production without a rewrite. Useless outside Databricks.
The separation of CI/CD-only vs project-only initialization is a real design win — MLEs can drop in CI/CD on an existing repo without clobbering the data scientist's work. Infrastructure as code via Databricks Asset Bundles means job config changes go through PRs, not ad-hoc UI clicks that nobody audited. The Unity Catalog integration handles the dev/staging/prod catalog namespace automatically so you don't wire that up yourself. Test scaffolding ships with the template — unit tests for each training step are generated, not bolted on later.
Still in public preview after what looks like years of existence — the Databricks bundle template system underneath it has shifted APIs at least twice, and if you initialized a project 18 months ago the generated YAML may not match current CLI expectations with no migration path. The sample ML code uses a toy NYC taxi dataset; teams spend time ripping that out before they can wire in their actual data source. No streaming inference pipeline — if you need real-time serving rather than batch, you're on your own from the start. Hard lock-in: every generated artifact assumes Databricks workspaces and Databricks CLI; there is no path to run the same pipeline locally or in a different cloud without rewriting the resource configs.