Kubernetes 1.36: Pod-Level Resource Managers (Alpha)
This blog post describes the Pod-Level Resource Managers, a new alpha feature introduced in Kubernetes v1.36. This enhancement extends the Kubelet's Topology, CPU, and Memory Managers to support pod-level resource specifications.
This feature evolves the resource managers from a strictly per-container
allocation model to a pod-centric one. It enables them to use
pod.spec.resources to perform NUMA alignment for the pod as a whole, and
introduces a partitioning scheme to manage resources for containers within that
pod-level grouping. This change introduces a more flexible and powerful resource
management model, particularly for performance-sensitive workloads, allowing you
to define hybrid allocation models where some containers receive exclusive
resources while others share the remaining resources from a pod shared pool.
This blog post covers:
- Why do we need Pod-Level Resource Managers?
- Glossary
- How do Pod-Level Resource Managers work?
- Current limitations and caveats
Why do we need Pod-Level Resource Managers?
When working with performance-critical workloads (like AI/ML, High-Performance Computing, or others), you often need exclusive, NUMA-aligned resources for your primary application containers. However, modern Kubernetes pods frequently include sidecar containers (e.g., for logging, monitoring, or data ingestion).
Historically, you either had to allocate exclusive, NUMA-aligned resources to every container in a Guaranteed pod (which is wasteful for lightweight sidecars) or forfeit the pod-level Guaranteed QoS class entirely.
By enabling the PodLevelResourceManagers feature (which also requires the
PodLevelResources feature gate), the kubelet can create hybrid resource
allocation models, bringing flexibility and efficiency to high-performance
workloads without sacrificing NUMA alignment.
Glossary
To fully understand this new feature, it helps to define a few key terms:
- Pod Level Resources: The resource budget defined at the pod level in
pod.spec.resources, which specifies the collective requests and limits for the entire pod. - Guaranteed Container: Within the context of this feature, a container is
considered
Guaranteedif it specifies resource requests equal to its limits for both CPU (exclusive CPU allocation requires a positive integer value) and Memory. This status makes it eligible for exclusive resource allocation from the resource managers. - Pod Shared Pool: The subset of a pod's allocated resources that remains after all exclusive slices have been reserved. These resources are shared by all containers in the pod that do not receive an exclusive allocation. While containers in this pool share resources with each other, they are strictly isolated from the exclusive slices and the general node-wide shared pool.
- Exclusive Slice: A dedicated portion of resources (e.g., specific CPUs or memory pages) allocated solely to a single container, ensuring isolation from other containers.
How do Pod-Level Resource Managers work?
The resource managers operate differently depending on the configured Topology Manager scope:
Pod Scope
When the Topology Manager scope is set to pod, the Kubelet performs a single
NUMA alignment for the entire pod based on the resource budget defined in
pod.spec.resources.
The resulting NUMA-aligned resource pool is then partitioned:
- Exclusive Slices: Containers that specify
Guaranteedresources are allocated exclusive slices from the pod's total allocation. - Pod Shared Pool: The remaining resources form a shared pool that is shared among all other non-Guaranteed containers in the pod. While containers in this pool share resources with each other, they are strictly isolated from the exclusive slices and the general node-wide shared pool.
Note that when standard init containers run to completion, their resources are added to a per-pod reusable set, rather than being returned to the node's resource pool. Because they run sequentially, these resources are made reusable for subsequent app containers (either for their own exclusive slices or for the shared pool).
This allows you to co-locate containers that require exclusive resources with those that do not, all within a single NUMA-aligned pod.
Important Pod Scope considerations:
- Empty Shared Pool Rejection: If the sum of all exclusive container requests
exactly matches the pod's total budget, but there is another container that
requires the shared pool, the pod will be rejected at admission. For
example, a pod asking for a pod-level budget of 4 CPUs, where
container-1requires an exclusive 1 CPU andcontainer-2requires an exclusive 3 CPUs. Because there are 0 CPUs left in the shared pool forcontainer-3, this pod is rejected.
Container Scope
When the Topology Manager scope is set to container, the Kubelet evaluates
each container individually for exclusive allocation.
If the overall pod achieves a Guaranteed QoS class via pod.spec.resources,
you can mix and match containers:
- Containers with their own
Guaranteedrequests receive exclusive NUMA-aligned resources. - Other non-Guaranteed containers in the pod run in the node's general shared pool.
- The collective resource consumption of all containers is still enforced by
the pod's
pod.spec.resourceslimits.
This scope is extremely useful when an infrastructure sidecar needs to be aligned to a specific NUMA node for device access, while the main workload can run in the general node shared pool.
Under-the-hood: CPU Quotas (CFS)
When running mixed workloads within a pod, isolation is enforced differently depending on the allocation:
- Exclusive Containers: Containers granted exclusive CPU slices have their
CPU CFS quota enforcement disabled (
ResourceIsolationContainer), allowing them to run without being throttled by the Linux scheduler. - Pod Shared Pool Containers: Containers falling into the pod shared pool
have CPU CFS quotas enabled (
ResourceIsolationPod), ensuring they do not consume more than the leftover pod budget.
Current limitations and caveats
- The functionality is currently implemented only for the
staticCPU Manager policy and theStaticMemory Manager policy. - This feature is only supported on Linux nodes. On Windows nodes, the resource managers will act as a no-op for pod-level allocations.
- As a fundamental requirement of using pod.spec.resources, the sum of all container-level resource requests must not exceed the pod-level resource budget.
- If you downgrade the Kubelet to a version that does not support this feature, the older Kubelet will fail to read the newer checkpoint files. This incompatibility occurs because the newer schema introduces new top-level fields to store pod-level allocations, which older Kubelet versions cannot parse.
Getting started and providing feedback
You can read the Assign Pod-level CPU and memory resources to understand how to use the overall Pod Level Resource feature, and Use Pod-level Resources with Resource Managers documentation to learn more about how to use this feature!
As this feature moves through Alpha, your feedback is invaluable. Please report any issues or share your experiences via the standard Kubernetes communication channels: