Keynote

CNCF Growth

The community is almost at 20M developers, Europe makes up the largest group of contributors. Read more in this report: https://www.cncf.io/reports/state-of-cloud-native-development-q1-2026/.

NVIDIA joins the CNCF as a platinum member. Agones looks like a fun new incubating project.

Reference Architectures

https://architecture.cncf.io/ is a place to build and share reference architectures.

NVIDIA

The future of AI is open and community driven. Kubernetes started small, as a scheduler for containers. The community didn't just adopt it, they standardized upon it. Cloud vendors built their platform on top of it. Over time, Kubernetes became the defacto standard for running mission critical workloads. That's why it keeps invoving from apps to databases and from schedulers to platforms.

Nvidia has been part of open source, over the last years they have been contributing directly to CNCF projects, like the KAI scheduler. Nvidia is contributing the GPU driver directory to the SIG Node organisation. As part of their commitment, Nvidia is pledging 4M dollars to CNCF projects over the next 3 years.

Inference

How do we scale inference? Two thirds of AI compute in 2026 is dedicated to inference, only one third is AI training. The projection is that inference consumes as much energy as all other workloads combined. This isn't the flex they think it is. Agents are superusers of inference, the power consumption in the agentic worlds grows exponentionally.

According to the CNCF survey, over 80 percent of organizations run Kubernetes already. Organizations running AI workloads do this on top of Kubernetes. The Cloud Native era has to expand into the AI native era by extending their toolstack.

The open model ecosystem is growing rapidly, but this comes with some challenges. https://docs.vllm.ai/en/latest/ optimizes a single host, llm-d optimizes a cluster of vLLMs. LLM serving breaks the model of traditional load balancers because they were built for stateless workloads. llm-d's inference gateway inspects incoming prompts and sends them to the GPU that already has a populated KV cache.

The Evolution of AI

Sepcialized models will post train foundation models on privatized data. They models can be cheaper and more performant to run than the generalized models.

ML at Uber

AI and ML have been part of the Uber experience long before they we buzzwords. Dispatch, pricing, matching and personalization have been in the application since the beginning. What's the challenge? Scale. Uber is live in 10K cities, doing 33M trips per day. The platform has to support over 30M predictions per second. Michelangelo is the ML platform by Uber that drives all of this (haha). The first iteration was classic AI, newer generations use the more recent flavors of Agentic AI.

Ubers has a Kubernetes based control plane, managing a data plane consisting of Spark, PyTorch, TensorFlow and others. So the only thing the ML developers have to worry about is the machine learning.

Inference Scheduling at Wayve

Wayve builds end to end AI for autonomous driving. They vehicles collects data of thousands of driving hours per day. At peak, they handle around 100k concurrent workloads. Kueue is Kubernetes queueing system with some advanced features to enhance the Kubernetes scheduler. By using Kueue, the wait time for workloads went down and the utilization of the CPU cluster went up to reduce idle time.

EKS

Very few teams using Kubernetes have to reinvent the wheel. Everyone is solving the same problem and can take solution out of the cncf landscape. Karpenter is the node lifecycle manager for Kubernetes. Karpenter privsions the optimal instance in what's available.

kro simplifies managent of resources by automatically creating CRDs based on a set of resources you need. Give developers simple building blocks and abstract away the hard parts.

Kubernetes RBAC is great at telling what something can do, but not to which resource. Cedar is a tool to build emission policies to say "actually...not to this resource".

Optimize Kubernetes for AI

Nvidia AI Cluster Runtime provides a CLI and recipes to set up Kubernetes cluster optimized for AI workfloads. The more inputs you provide into the recipe, the more optimized the cluster will be. Once you have the recipe for the desired state of the cluster, you can create a bundle which can be deployed through Helm or ArgoCD.