Kubernetes 1.36: Pod-Level Resource Managers (Alpha)

This blog post describes the Pod-Level Resource Managers, a new alpha feature introduced in Kubernetes v1.36. This enhancement extends the Kubelet's Topology, CPU, and Memory Managers to support pod-level resource specifications.

This feature evolves the resource managers from a strictly per-container allocation model to a pod-centric one. It enables them to use pod.spec.resources to perform NUMA alignment for the pod as a whole, and introduces a partitioning scheme to manage resources for containers within that pod-level grouping. This change introduces a more flexible and powerful resource management model, particularly for performance-sensitive workloads, allowing you to define hybrid allocation models where some containers receive exclusive resources while others share the remaining resources from a pod shared pool.

This blog post covers:

  1. Why do we need Pod-Level Resource Managers?
  2. Glossary
  3. How do Pod-Level Resource Managers work?
  4. Current limitations and caveats

Why do we need Pod-Level Resource Managers?

When working with performance-critical workloads (like AI/ML, High-Performance Computing, or others), you often need exclusive, NUMA-aligned resources for your primary application containers. However, modern Kubernetes pods frequently include sidecar containers (e.g., for logging, monitoring, or data ingestion).

Historically, you either had to allocate exclusive, NUMA-aligned resources to every container in a Guaranteed pod (which is wasteful for lightweight sidecars) or forfeit the pod-level Guaranteed QoS class entirely.

By enabling the PodLevelResourceManagers feature (which also requires the PodLevelResources feature gate), the kubelet can create hybrid resource allocation models, bringing flexibility and efficiency to high-performance workloads without sacrificing NUMA alignment.

Glossary

To fully understand this new feature, it helps to define a few key terms:

  • Pod Level Resources: The resource budget defined at the pod level in pod.spec.resources, which specifies the collective requests and limits for the entire pod.
  • Guaranteed Container: Within the context of this feature, a container is considered Guaranteed if it specifies resource requests equal to its limits for both CPU (exclusive CPU allocation requires a positive integer value) and Memory. This status makes it eligible for exclusive resource allocation from the resource managers.
  • Pod Shared Pool: The subset of a pod's allocated resources that remains after all exclusive slices have been reserved. These resources are shared by all containers in the pod that do not receive an exclusive allocation. While containers in this pool share resources with each other, they are strictly isolated from the exclusive slices and the general node-wide shared pool.
  • Exclusive Slice: A dedicated portion of resources (e.g., specific CPUs or memory pages) allocated solely to a single container, ensuring isolation from other containers.

How do Pod-Level Resource Managers work?

The resource managers operate differently depending on the configured Topology Manager scope:

Pod Scope

When the Topology Manager scope is set to pod, the Kubelet performs a single NUMA alignment for the entire pod based on the resource budget defined in pod.spec.resources.

The resulting NUMA-aligned resource pool is then partitioned:

  1. Exclusive Slices: Containers that specify Guaranteed resources are allocated exclusive slices from the pod's total allocation.
  2. Pod Shared Pool: The remaining resources form a shared pool that is shared among all other non-Guaranteed containers in the pod. While containers in this pool share resources with each other, they are strictly isolated from the exclusive slices and the general node-wide shared pool.

Note that when standard init containers run to completion, their resources are added to a per-pod reusable set, rather than being returned to the node's resource pool. Because they run sequentially, these resources are made reusable for subsequent app containers (either for their own exclusive slices or for the shared pool).

This allows you to co-locate containers that require exclusive resources with those that do not, all within a single NUMA-aligned pod.

Important Pod Scope considerations:

  • Empty Shared Pool Rejection: If the sum of all exclusive container requests exactly matches the pod's total budget, but there is another container that requires the shared pool, the pod will be rejected at admission. For example, a pod asking for a pod-level budget of 4 CPUs, where container-1 requires an exclusive 1 CPU and container-2 requires an exclusive 3 CPUs. Because there are 0 CPUs left in the shared pool for container-3, this pod is rejected.

Container Scope

When the Topology Manager scope is set to container, the Kubelet evaluates each container individually for exclusive allocation.

If the overall pod achieves a Guaranteed QoS class via pod.spec.resources, you can mix and match containers:

  • Containers with their own Guaranteed requests receive exclusive NUMA-aligned resources.
  • Other non-Guaranteed containers in the pod run in the node's general shared pool.
  • The collective resource consumption of all containers is still enforced by the pod's pod.spec.resources limits.

This scope is extremely useful when an infrastructure sidecar needs to be aligned to a specific NUMA node for device access, while the main workload can run in the general node shared pool.

Under-the-hood: CPU Quotas (CFS)

When running mixed workloads within a pod, isolation is enforced differently depending on the allocation:

  • Exclusive Containers: Containers granted exclusive CPU slices have their CPU CFS quota enforcement disabled (ResourceIsolationContainer), allowing them to run without being throttled by the Linux scheduler.
  • Pod Shared Pool Containers: Containers falling into the pod shared pool have CPU CFS quotas enabled (ResourceIsolationPod), ensuring they do not consume more than the leftover pod budget.

Current limitations and caveats

  • The functionality is currently implemented only for the static CPU Manager policy and the Static Memory Manager policy.
  • This feature is only supported on Linux nodes. On Windows nodes, the resource managers will act as a no-op for pod-level allocations.
  • As a fundamental requirement of using pod.spec.resources, the sum of all container-level resource requests must not exceed the pod-level resource budget.
  • If you downgrade the Kubelet to a version that does not support this feature, the older Kubelet will fail to read the newer checkpoint files. This incompatibility occurs because the newer schema introduces new top-level fields to store pod-level allocations, which older Kubelet versions cannot parse.

Getting started and providing feedback

You can read the Assign Pod-level CPU and memory resources to understand how to use the overall Pod Level Resource feature, and Use Pod-level Resources with Resource Managers documentation to learn more about how to use this feature!

As this feature moves through Alpha, your feedback is invaluable. Please report any issues or share your experiences via the standard Kubernetes communication channels: