Article

Implementing NVIDIA MIG in Red Hat OpenShift to optimize GPU resources in containerized environments

A comprehensive guide for implementing NVIDIA MIG in Red Hat OpenShift

By Sarath Chandra Vidya Sagar Machupalli

This article is part of the “Optimizing GPU resources with NVIDIA Multi-Instance GPU (MIG)” series.

In the rapidly evolving landscape of containerized applications and cloud-native technologies, efficient resource utilization is paramount. For organizations leveraging GPU-accelerated workloads, particularly in the realms of artificial intelligence (AI) and machine learning (ML), optimizing GPU usage can lead to significant performance improvements and cost savings.

Learn more about MIG support in an OpenShift Container Platform, in the NVIDIA GPU Operator on Red Hat OpenShift Container Platform docs.

This comprehensive guide explores how to harness NVIDIA's Multi-Instance GPU (MIG) technology within Red Hat OpenShift to achieve unprecedented levels of GPU optimization and utilization. Learn more about optimizing GPU resources with NVIDIA MIG in this article.

GPU optimization techniques

There are two key GPU optimization techniques: the single strategy and the mixed strategy.

Single Strategy

In the single strategy technique, all MIG devices on a GPU are created with the same size. For example, on an A100-SXM4-40GB profile, you could create 7 slices of 1G.5GB or 3 slices of 2G.10GB.

Slices of GPU instance

Mixed Strategy

In mixed strategy technique, you create MIG devices of different sizes on the same GPU. For example, you could partition a GPU into two 1G.5GB units, one 2G.10GB unit, and one 3G.20GB unit.

MIG devices of different sizes on the same GPU

Key NVIDIA MIG components used in OpenShift

To fully leverage MIG in an OpenShift environment, several key components work in concert:

NVIDIA GPU Operator: This Kubernetes operator automates the deployment and management of GPU software components within OpenShift clusters. It handles the installation of drivers, runtime libraries, and monitoring tools necessary for GPU operations.
NVIDIA Device Plugin: This plugin exposes GPU resources to the Kubernetes scheduler, enabling it to make informed decisions about pod placement based on available GPU resources, including MIG instances.
NVIDIA Container Toolkit: This toolkit enables containers to access and utilize GPU capabilities, ensuring that containerized applications can leverage the full power of NVIDIA GPUs.
NVIDIA Driver: The driver provides the essential software interface between the operating system and the GPU hardware, enabling communication and control.

Implementing MIG in Red Hat OpenShift

It really is quite simple to implement MIG in Red Hat OpenShift:

Install the NVIDIA GPU operator.
Configure MIG.
Verify the MIG configuration.

Step 1. Install the NVIDIA GPU Operator

For more information, see the NVIDIA GPU Operator installation docs.

Log into the OpenShift web console with cluster administrator privileges.
Navigate to the Operator Hub and search for "NVIDIA GPU Operator".
Click Install and follow the prompts to complete the installation process.

Step 2. Configure MIG

Once the GPU Operator is installed, you can configure a MIG by creating a ClusterPolicy custom resource. This resource defines the MIG strategy and configuration for your cluster.

apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
  name: gpu-cluster-policy
spec:
  mig:
    strategy: mixed
    config:
      - gpuIds: ["0", "1"]
        mig1g.5gb: 2
        mig2g.10gb: 1
      - gpuIds: ["2", "3"]
        mig3g.20gb: 2

Apply this configuration using the OpenShift CLI:

oc apply -f clusterpolicy.yaml

Step 3. Verify the MIG configuration

After applying the configuration, verify the status of MIG devices:

oc exec -it <nvidia-device-plugin-pod> -- nvidia-smi mig -lgi

This command will display the current MIG configuration on your GPU nodes.

Optimization strategies for MIGs in OpenShift

Multi-Instance GPU (MIG) technology offers powerful optimization strategies for GPU resource allocation in OpenShift Container Platform environments.

Fine-Grained Resource Allocation

MIG allows for precise GPU resource allocation, enabling administrators to create various profiles tailored to specific workload requirements. For instance, on an NVIDIA A100-40GB GPU, you could create as shown in the mixed strategy above:

7 instances of 1G.5GB for lightweight inference tasks
3 instances of 2G.10GB for medium-sized training jobs
1 instance of 7G.40GB for large-scale deep learning models

This granular control ensures optimal resource utilization across diverse AI and ML workloads.

Dynamic Reconfiguration

One of the key advantages of MIG in OpenShift is the ability to dynamically reconfigure MIG geometries. This flexibility allows administrators to adapt to changing workload demands in real-time, without requiring system restarts. By monitoring workload patterns and GPU utilization, you can adjust MIG profiles to ensure optimal resource allocation at all times.

Workload-specific optimizations

Two workload-specific optimizations are encouraged:

AI/ML pipeline optimizations
Multi-tenant environment optimizations

AI/ML pipeline optimizations

For complex AI/ML pipelines, consider using a mix of MIG profiles to optimize each stage of the workflow:

Use smaller instances (for example, 1G.5GB) for data preprocessing and feature engineering tasks
Allocate medium-sized instances (for example, 2G.10GB) for model training phases
Reserve larger instances (for example, 4G.20GB) for inference on production models

This approach maximizes GPU utilization across the entire ML lifecycle, ensuring that each stage has access to appropriate GPU resources.

Multi-tenant environment optimizations

In multi-user scenarios, MIG enables efficient resource sharing while maintaining performance isolation:

Assign smaller MIG instances to data scientists for experimentation and development
Provide larger instances for production workloads that require more computational power
Use mixed configurations to support both interactive notebooks and batch jobs simultaneously

This strategy improves GPU accessibility for all users while ensuring that critical workloads have access to the necessary resources.

Performance monitoring and tuning

With MIG, these performance monitoring and tuning tasks are possible:

GPU telemetry
Automated scaling

GPU telemetry

You can use the NVIDIA DCGM Exporter to collect detailed GPU metrics, including:

GPU utilization
Memory usage
Power consumption
Temperature

Use this data to identify bottlenecks and optimize MIG configurations for better performance. By analyzing these metrics over time, you can make informed decisions about resource allocation and identify opportunities for further optimization.

Automated scaling

Implement auto-scaling policies based on GPU utilization metrics to ensure efficient resource allocation:

Scale up MIG instances when utilization consistently exceeds 80%
Scale down or reconfigure when utilization drops below 20%

This approach ensures that GPU resources are always optimally allocated, improving overall cluster efficiency and reducing costs.

Advanced MIG configurations

Finally, consider these advanced MIG configurations:

Heterogeneous MIG setups
Integration with OpenShift virtualization

Heterogeneous MIG setups

For maximum flexibility, consider using the 'mixed' MIG strategy, which allows for diverse MIG configurations across different GPUs:

spec:
  mig:
    strategy: mixed
    config:
      - gpuIds: ["0", "1"]
        mig1g.5gb: 2
        mig2g.10gb: 1
      - gpuIds: ["2", "3"]
        mig3g.20gb: 2

This configuration supports a wide range of workload types on a single node, providing the flexibility to run both small-scale and large-scale GPU workloads simultaneously.

Integration with OpenShift virtualization

For organizations leveraging virtualized environments, MIG can be used to provide GPU acceleration to virtual machines:

Enable GPU passthrough in the OpenShift Virtualization operator
Create MIG profiles suitable for VM workloads
Assign MIG-based vGPUs to VMs for accelerated computing in virtualized environments

This integration allows for GPU acceleration in both containerized and virtualized workloads, providing a unified platform for all types of GPU-accelerated applications.

Conclusion

The integration of NVIDIA MIG technology with Red Hat OpenShift provides a powerful solution for optimizing GPU resources in containerized environments. By enabling fine-grained GPU partitioning, organizations can significantly improve resource utilization, enhance workload isolation, and maximize the value of their GPU investments.

The strategies outlined in this guide, from fine-grained resource allocation and dynamic reconfiguration to workload-specific optimizations and advanced monitoring, provide a comprehensive approach to GPU optimization in OpenShift environments. By implementing these techniques, organizations can create a highly efficient, flexible, and cost-effective platform for GPU-accelerated computing.

As AI and machine learning workloads continue to grow in importance and complexity, the combination of MIG and OpenShift offers a scalable, efficient solution that can adapt to evolving computational demands. By mastering these optimization techniques, organizations can stay at the forefront of GPU-accelerated computing, driving innovation and maintaining a competitive edge in the rapidly evolving landscape of AI and ML technologies.

Acknowledgements

This article was produced as part of an IBM Open Innovation Community Initiative.

Topics

Languages

Products

Open Source