Advanced scheduling for AI/ML with Ray and Kueue

March 22, 2024

3 min read

Ray is an open-source unified compute framework gaining popularity among developers for its ability to easily scale AI/ML and Python applications. KubeRay offers a solution to harness the power of Ray within Google Kubernetes Engine (GKE). It serves as an orchestrator for Ray clusters, leveraging Kubernetes APIs as the foundational layer for compute, networking, and storage. KubeRay can also integrate with Kueue, a powerful cloud-native queueing system, unlocking advanced scheduling capabilities for Ray applications on GKE.

In this blog, we’ll dive into how KubeRay and Kueue work together to orchestrate advanced scheduling for Ray applications. We’ll explore techniques for:

Priority scheduling: Prioritize AI/ML tasks to ensure production reliability and improve cost efficiency.
Gang scheduling: Orchestrate the simultaneous execution of tightly coupled AI/ML tasks to maximize resource usage and accelerate training.

Priority scheduling

Scenario

A company hosts a variety of batch workloads on a GKE cluster, including continuous integration (CI) tests and offline batch inference tasks using RayJob for production purposes. Given that the cluster’s resources are finite and must be shared among these applications, the company aims to ensure the swift completion of production workloads. Consequently, priority scheduling is employed, allowing production workloads to preempt resources from lower-priority tasks, such as CI tests.

https://techontheblog.com/wp-content/uploads/2024/03/localimages/image1_Px2ykDl.max-1700x1700.png

How do Kueue and KubeRay implement priority scheduling?

Kueue’s WorkloadPriorityClass API provides the control necessary to prioritize RayJob and RayCluster resources within your GKE environment. Workload priority influences two key aspects. First, it determines the order of workloads in the ClusterQueue, with higher-priority workloads being executed earlier. Second, when there is insufficient quota in a ClusterQueue or its cohort, an incoming workload can trigger the preemption of previously admitted workloads, based on policies for the ClusterQueue. For RayJob and RayCluster, preemption involves deleting all Ray Pods and transitioning the custom resource status to ‘Suspended.’ When the ClusterQueue has enough resources later on, Kueue resumes the suspended workloads.

Priority scheduling is crucial for scenarios where production and development workloads compete for limited resources. By assigning higher priorities to production-related tasks, you enable preemption of less time-sensitive jobs, ensuring production models are updated and deployed in a timely manner.

Here’s an example of two WorkloadPriorityClass resources to distinguish between workloads for production and development:

You can then assign priority classes to KubeRay resources using Kueue’s priority class label:

See the Priority Scheduling with RayJob and Kueue guide for a full walk-through.

Gang scheduling

Scenario

Kueue’s all-or-nothing approach to workload admission ensures that RayJobs and RayClusters are scheduled only when all required resources are available. This significantly improves resource efficiency by preventing partially provisioned clusters that are unable to execute tasks. This strategy, often termed “gang scheduling,” is particularly valuable for the resource-intensive nature of AI/ML workloads.

Gang scheduling is important for use cases like data parallelism in distributed model training. Data parallelism shards data across multiple Pods, each running the same model. All gradients are sent to a parameter server, which updates the hyperparameters and then redistributes them to all Pods for the next iteration. If the RayJob or RayCluster is partially provisioned, the parameter server can’t update the hyperparameters and will become stuck until the custom resource becomes fully provisioned. This results in a total waste of resources. Gang scheduling can effectively avoid this situation.

How do Kueue and KubeRay implement gang scheduling?

You can take advantage of Kueue’s dynamic resource provisioning and queueing to orchestrate gang scheduling with KubeRay. This is essential when working with limited hardware accelerators like GPUs and TPUs. Kueue ensures Ray workloads execute only when all required resources are available, preventing wasted GPU/TPU cycles and maximizing utilization.

Kueue achieves this efficient gang scheduling on GKE using the ProvisioningRequest API. This API signals that a Ray workload should wait until the necessary compute nodes can be provisioned simultaneously. GKE’s cluster autoscaler accepts the ProvisioningRequest, scaling up nodes in one step, if and only if all required resources are available. Ray cluster Pods are then scheduled together on the newly provisioned nodes. Refer to How ProvisioningRequest Works for more details.

For a step-by-step demonstration, see the Gang Scheduling with RayJob and Kueue guide.

Conclusion

KubeRay and Kueue offer powerful tools for managing and optimizing Ray applications within GKE. Priority scheduling helps you ensure your most important AI/ML tasks always get the resources they need. Gang scheduling helps you make the most of hardware accelerators, preventing wasted time and maximizing efficiency. Together, these techniques improve the performance and cost-effectiveness of your Ray applications on the cloud

Posted in

techontheblog.com

How Palo Alto Networks uses BigQuery ML to automate resource classification

Healthcare’s AI transformation is already underway

March 13, 2024

AI is making a powerful impact all around us, but few sectors stand to benefit as profoundly as healthcare.…

healthcare’s-ai-transformation-is-already-underway

5 min read

Powering generative AI with cloud storage innovations at Next ’24

April 20, 2024

Generative AI is changing the way we create, innovate, and interact with the world. From generating realistic…

powering-generative-ai-with-cloud-storage-innovations-at-next-’24

5 min read

Search engines made simple: A low-code approach with GKE and Vertex AI Agent Builder

July 24, 2024

Building a search engine for your website used to be a formidable task that required substantial resources and…

search-engines-made-simple:-a-low-code-approach-with-gke-and-vertex-ai-agent-builder

4 min read

The evolution of play: From live to living games

March 29, 2024

Over the last year, we have seen the live game model explode in popularity, with live service modalities now…

the-evolution-of-play:-from-live-to-living-games

4 min read

Hex-LLM: High-efficiency large language model serving on TPUs in Vertex AI Model Garden

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

Introducing Partner Companion: An AI-powered advisor for enhanced customer engagement

Search engines made simple: A low-code approach with GKE and Vertex AI Agent Builder

Advanced scheduling for AI/ML with Ray and Kueue

Priority scheduling

Scenario

How do Kueue and KubeRay implement priority scheduling?

Gang scheduling

Scenario

How do Kueue and KubeRay implement gang scheduling?

Conclusion

How Palo Alto Networks uses BigQuery ML to automate resource classification

Automatic driver installation simplifies using NVIDIA GPUs in GKE

Leave a Reply Cancel reply

Hex-LLM: High-efficiency large language model serving on TPUs in Vertex AI Model Garden

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

Introducing Partner Companion: An AI-powered advisor for enhanced customer engagement

Search engines made simple: A low-code approach with GKE and Vertex AI Agent Builder

Deckmatch powers insights for venture capitalists with Cloud SQL for PostgreSQL

Hex-LLM: High-efficiency large language model serving on TPUs in Vertex AI Model Garden

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

Introducing Partner Companion: An AI-powered advisor for enhanced customer engagement

Search engines made simple: A low-code approach with GKE and Vertex AI Agent Builder

Deckmatch powers insights for venture capitalists with Cloud SQL for PostgreSQL

Advanced scheduling for AI/ML with Ray and Kueue

Priority scheduling

Scenario

How do Kueue and KubeRay implement priority scheduling?

Gang scheduling

Scenario

How do Kueue and KubeRay implement gang scheduling?

Conclusion

Share this article

How Palo Alto Networks uses BigQuery ML to automate resource classification

Automatic driver installation simplifies using NVIDIA GPUs in GKE

Leave a Reply Cancel reply

Read next