Kubernetes v1.36: Enhancing Scheduling for Advanced Workloads
May 13, 2026
405 views
AI/ML workloads, especially in environments like Kubernetes, present a set of intricate challenges when it comes to resource scheduling. The traditional Pod-by-Pod approach doesn't cut it anymore. Enter Kubernetes v1.35's groundbreaking attempt at addressing this with initial enhancements to workload-aware scheduling. This version brought forward a foundational Workload API and introduced the concept of gang scheduling, which allows groups of Pods to be scheduled together, thus optimizing the efficiency of identical Pod processing through opportunistic batching.
However, with the release of Kubernetes v1.36, we witness a monumental step forward in architectural design. Here, the separation of API concerns becomes pronounced. The Workload API now acts solely as a static blueprint, while the newly implemented PodGroup API takes charge of managing the runtime status. This approach is not just a matter of tidiness; it enhances performance significantly. The new `kube-scheduler` now benefits from a dedicated PodGroup scheduling cycle that ensures atomic processing of workloads, effectively reducing the likelihood of encountering scheduling deadlocks.
Moreover, v1.36’s introduction of topology-aware scheduling and workload-aware preemption amplifies the scheduling capabilities of Kubernetes even further. But what does this really mean for developers and operators within this space? The new ResourceClaim feature enables Dynamic Resource Allocation (DRA) for PodGroups, simplifying resource management and enabling more flexible scheduling.
### The Reinvented Workload and PodGroup APIs
In this evolution, the Workload API remains a static template, while the PodGroup API evolves into a first-class runtime object. This decoupling was necessary, for Kubernetes v1.35 had tightly woven the runtime state of Pod groups within the Workload resource itself. Now, with the independent PodGroup API, updates become more efficient; it permits per-replica sharding of status notifications. A more streamlined logic can be observed within the `kube-scheduler`, which can reference the PodGroup directly for the necessary information without the hassle of parsing through the Workload object.
Configuration has been updated accordingly. Workload controllers, like the Job controller, now define Workload objects purely as templates for the associated Pod groups. When these templates are instantiated, they yield runtime PodGroup instances that dictate the actual scheduling protocols while retaining a reference to their static blueprint. This separation fosters a cleaner and more maintainable architecture, opening the doors to advanced scheduling capabilities we can expect in future Kubernetes iterations.
### The Role of the PodGroup in Scheduling Dynamics
Now, with the inclusion of the PodGroup scheduling cycle, the `kube-scheduler` efficiently tackles the scheduling of workloads by assessing them as unified entities rather than handling each Pod independently. This fundamental change enhances scheduling operations significantly. When a PodGroup member enters the scheduling queue, the entire group is evaluated collectively, which not only ensures resource allocation is optimized but also prevents potential resource conflicts or deadlocks.
The scheduling process now occurs in a structured cycle, beginning with a cluster state snapshot to avoid race conditions. The scheduler applies algorithms to determine Node placements collectively for all Pods within the group, rendering decisions based on that compact assessment. If the scheduler manages to find acceptable placements, the Pods are transitioned to the binding phase en masse; if not, they’ll remain unassigned and will be sent back to the queue for another attempt later.
Despite the promising enhancements, Kubernetes v1.36 does come with limitations. The effectiveness of scheduling varies depending on the homogeneity of Pod groups. For basic, homogeneous setups, secure placements are generally attainable. However, more complex arrangements—especially those involving inter-Pod dependencies—face challenges that may lead to unschedulable outcomes. As Kubernetes continues to evolve, it's evident that these improvements lay the groundwork for more advanced tooling that will be essential for handling increasingly sophisticated workloads.
Understanding Topology-Aware Scheduling
The Kubernetes v1.36 update introduces significant enhancements to how Pods are scheduled, particularly with respect to their physical placement within a cluster. Central to this is the revamped use of a **PodGroup**, which provides a structured way to manage Pods that must operate together, respecting specific topological constraints. The `kube-scheduler` utilizes a new algorithm that focuses on resource optimization and scheduling effectiveness based on topology, specifically targeting rack placements. Here's how it works: initially, the scheduler generates potential placements that fit the specified rack constraints. It employs the `PlacementGenerate` extension to assemble these candidate groupings from the eligible Nodes. Next, every arrangement is assessed to ensure the PodGroup can be successfully positioned within that placement—a critical check to avoid wasted resources. Finally, all viable candidates are scored using the newly available `PlacementScore` extension, which aids the scheduler in determining the most effective arrangement. Interestingly, while topology-aware scheduling lays a strong foundation, it currently stops short of allowing Pod preemption—a mechanism where existing Pods might be evicted to make way for higher-priority workloads. However, future iterations will incorporate workload-aware preemption, aligning scheduling priorities with spatial constraints more effectively.Rethinking Preemption with Workload Awareness
The introduction of **workload-aware preemption** marks a pivotal shift in how Kubernetes handles scheduling conflicts. Instead of evaluating Pods individually for potential preemption, this new mechanism treats an entire PodGroup as one cohesive unit, simplifying the scheduling process. This means that when scheduling conflicts arise, the scheduler can remove Pods from various Nodes simultaneously, rather than being confined to a single Node’s context. This approach not only streamlines the scheduling but also introduces two crucial fields into the PodGroup API: **priority** and **disruptionMode**. The priority field allows for overriding the individual Pod priorities with a unified metric for the entire PodGroup, thereby simplifying prioritization. The disruptionMode provides flexibility about whether Pods can be preempted independently, or if the action must be taken for the entire group at once—an “all-or-nothing” approach. As it stands, the current implementation respects these fields solely during workload-aware preemption cycles. However, there's a tangible push from the Kubernetes team to extend this functionality across other scheduling scenarios in the future.Advancements in Resource Management with ResourceClaims
Kubernetes has made considerable strides with the introduction of **Dynamically Allocated Resource Claims (DRA)** since its rollout in v1.34. This system allows Pods to articulate detailed requirements for specific devices, including GPUs and more. The latest update elevates the concept further, allowing **PodGroups** to leverage ResourceClaimTemplates effectively. Instead of each Pod needing its ResourceClaim, a single claim is generated for the entire PodGroup. This greatly simplifies resource management in large deployments and allows for greater sharing potential across Pods within a group. To illustrate, if two Pods belong to the same PodGroup and reference the same ResourceClaimTemplate, the system generates just one ResourceClaim for the entire group. This not only eases the burden of resource allocation but also overcomes previous limitations on the number of individual Pods that could be associated with a ResourceClaim. Now, instead of each Pod cluttering individual ResourceClaim statuses, a single PodGroup reference can encapsulate many Pods, vastly increasing efficiency in resource sharing scenarios.Seamless Integration with the Job Controller
The v1.36 version of Kubernetes has also enhanced the **Job controller** to automatically handle the creation and management of Workload and PodGroup objects. This means that Jobs designed for tightly coupled parallel applications, like distributed training tasks, can now be scheduled with minimal manual overhead. Previously, developers would need to manually set up these references, but the Job controller now streamlines this process. When the `WorkloadWithJob` feature gate is activated, the Job controller performs several essential tasks: it creates a Workload and a corresponding PodGroup for each Job, assigns the schedulingGroup for Pods created by the Job, and designates the Job as the owner, ensuring proper garbage collection when the Job ends. This integration is limited to clearly defined Jobs with specified characteristics—like parallelism greater than one and strict completion modes—thus keeping the first implementation streamlined. By automating the handling of these relationships, Kubernetes reduces the friction often encountered during complex task scheduling, making it easier for teams to deploy robust, distributed applications efficiently.Final Thoughts on Gang Scheduling
The introduction of gang scheduling in Kubernetes represents a nuanced leap towards enhancing how workloads are processed within clusters. By enabling jobs to be scheduled in groups rather than individually, the potential for efficiency gains—especially in compute-intensive environments—is significant. However, this capability isn't without its limitations. Currently, only indexed, fully-parallel jobs can take advantage of gang scheduling, relegating other job types to the traditional pod-by-pod scheduling model. This limitation underscores a pivotal challenge the Kubernetes community faces: expanding compatibility while maintaining system stability and performance. As the Kubernetes ecosystem evolves, there’s a hunger for more versatile scheduling mechanisms that can adapt to dynamic workloads. Initiatives aimed at graduating Workload and PodGroup APIs to Beta will be crucial. These efforts not only aim to solidify the role of gang scheduling but also to implement features like `minCount`, paving the way for elastic jobs. You need to keep an eye on these developments if you’re engaged in workload management; the success of these features may very well dictate your infrastructure's future scalability and flexibility. There's a vibrant roadmap ahead, including multi-level workload hierarchies and the unification of controller integrations. These enhancements aren't merely technical upgrades; they are essential for addressing complex use cases found in AI workloads and beyond. If you think about the intricate demands of today's applications, the current constraints might seem inadequate, but they also present a clear avenue for innovation. To stay relevant, participating in this evolution could provide critical insights and shape future interactions within the Kubernetes community. Embrace these new features in your testing environments and share your feedback, as your insights could help refine these capabilities. Engaging through community channels, such as Slack or formal meetings, will also keep you at the forefront of Kubernetes advancements. Ultimately, while we have significant strides to make, this is just the beginning. With every update, Kubernetes is crafting a more workload-aware future that could redefine deployment and resource management strategies for developers and enterprises alike. Stay tuned, because what unfolds next could be transformative for how we approach cloud-native architectures.
Source:
William Garcia
·
https://kubernetes.io/blog/2026/05/13/kubernetes-v1-36-advancing-workload-aware-scheduling/