// the find
kserve/kserve
Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes
KServe is a Kubernetes-native inference platform that puts a unified CRD-based API in front of vLLM, TensorFlow, PyTorch, XGBoost, scikit-learn, and others. It handles the Kubernetes plumbing — autoscaling, canary rollouts, traffic splitting, model explainability — so you don't have to wire it up yourself. The target audience is ML platform teams running inference at scale on Kubernetes, not individual practitioners.
First-class support for both predictive and generative workloads under one API surface is genuinely useful — most tools pick one lane. The InferenceGraph CRD for pipelines and ensembles is a real differentiator; routing between predictor, transformer, and explainer components is declarative and composable. The local model cache system (LocalModelCache CRDs, DaemonSet-based node agents) solves a real problem: model loading latency on cold nodes wrecks p99s. CNCF incubating status and a real adopters list with named organizations mean this isn't vaporware — the Go controller code and Helm charts are production-grade.
The dependency graph is brutal: full serverless mode requires Knative, which requires Istio or another service mesh, which means you're suddenly maintaining three complex systems before serving a single model. The split between InferenceService (predictive) and LLMInferenceService (generative) is an API fracture that will cause confusion — two separate controllers, two separate CRDs, different lifecycle semantics. Local development story is poor; the quickstart guide basically asks you to run a full Kubernetes cluster with cert-manager, Istio, and Knative locally. The Python runtime servers (sklearn, xgboost, huggingface) live in the same repo but have separate Docker build workflows, which means version skew between the controller and runtimes is a real ops risk.