// the find

mosaicml/llm-foundry

★ 4,407 · Python · Apache-2.0 · updated Mar 2026

LLM training code for Databricks foundation models

LLM Foundry is MosaicML/Databricks's production training stack for large language models, built on top of their Composer training library. It handles the full lifecycle: data prep, pretraining, finetuning, evaluation, and inference conversion. This is what they actually used to train DBRX and the MPT series, so it's battle-tested at scale.

The YAML-based configuration system paired with a proper component registry (entrypoints, decorators, direct registration) means you can swap out models, loggers, and callbacks without forking the codebase. The data pipeline defaults to MosaicML's StreamingDataset format, which handles multi-node training without the usual distributed data headaches. FP8 training via TransformerEngine on H100s is wired in and documented, not an afterthought. The eval harness ships with a large set of ICL benchmarks out of the box, covering commonsense reasoning, math, coding, and safety — useful for anyone who wants reproducible model comparisons.

The Composer dependency is load-bearing throughout, which means you're buying into MosaicML's entire distributed training abstraction whether you want it or not — swapping to raw PyTorch DDP or FSDP without Composer is not a realistic option. AMD and Intel Gaudi support are explicitly beta/experimental, and the AMD setup instructions still include a manual downgrade of numpy to 1.23.5, which tells you the level of polish there. The Docker images note that llm-foundry itself isn't preinstalled in the image, just dependencies — minor, but the kind of thing that trips people up during onboarding. Activity has slowed since Databricks absorbed MosaicML; the last push was March 2026 and the news section hasn't been updated in a while, so if you need active community support for edge cases, you may be on your own.

View on GitHub → Homepage ↗