// the find

loft-sh/vcluster

★ 11,169 · Go · Apache-2.0 · updated Jun 2026

vCluster - Create fully functional virtual Kubernetes clusters - Each vcluster runs inside a namespace of the underlying k8s cluster. It's cheaper than creating separate full-blown clusters and it offers better multi-tenancy and isolation than regular namespaces.

vCluster runs full virtual Kubernetes control planes (API server, controller manager, etcd) inside namespaces of a host cluster, giving tenants admin-level cluster access without the cost of separate physical clusters. It targets platform teams doing multi-tenancy, CI environments, AI/GPU workload isolation, and ISVs who need per-customer k8s environments. The project has real production adoption (CoreWeave, Adobe) and is CNCF-certified.

- The resource syncing architecture is genuinely clever: virtual resources in the vcluster get translated and synced to real host resources, so pods actually run on real nodes without tenants needing host cluster access. The sync engine handles name mangling to avoid collisions.

- Multiple deployment modes (shared nodes, dedicated nodes, private nodes, standalone) give a real spectrum from 'cheap dev environment' to 'full hardware isolation for compliance', which is more honest than most tools that pretend one mode fits all.

- Helm chart has proper unit tests using helm-unittest (chart/tests/), the values.yaml has a JSON schema, and conformance test results are committed going back to v1.19 - this is the kind of operational rigor that's often missing in OSS infra projects.

- The sleep/wake mode for idle clusters is a practical cost feature that actually works at the namespace level, not just a paper spec - and it's been in production long enough to have workload-level sleep annotations added in v0.33.

- Still on v0.x versioning despite being used in production at scale - this is a real API stability concern. Breaking changes in values.yaml between releases have historically caused pain (legacyconfig migration code in the repo is evidence of this).

- The resource syncing model has fundamental limits that bite you in practice: things like host-level networking features, certain CSI drivers, and admission webhooks that inspect pod specs can behave unexpectedly or require awkward workarounds in shared-nodes mode.

- The most interesting features (private nodes, standalone mode, auto nodes with Karpenter, snapshots/restore, sleep mode) are gated behind the commercial platform or require significant configuration work that isn't well-documented in the OSS tier. The free tier ceiling of 64 CPUs / 32 GPUs clarifies where the business model kicks in.

- The .claude/ directory with AI coding rules and workflow instructions committed to the repo suggests they're using AI for code generation at scale - this isn't inherently bad but the e2e test suite that results from it can be hard for human contributors to navigate, as evidenced by the extensive old-to-new mapping docs needed to track test migration.

View on GitHub → Homepage ↗