Pod Placement Techniques to Avoid Job Failures Due to Low GPU Memory in a Kubernetes Environment with Shared GPUs

Jihun Kang; Hwamin Lee; Daewon Lee

doi:10.18517/ijaseit.14.5.11589

DOI : https://doi.org/10.18517/ijaseit.14.5.11589

Pod Placement Techniques to Avoid Job Failures Due to Low GPU Memory in a Kubernetes Environment with Shared GPUs

Jihun Kang ⁽¹⁾, Hwamin Lee ⁽²⁾, Daewon Lee ⁽³⁾

(1) Department of Computer Science, Korea National Open University, 86, Daehak-ro, Jongno-gu, Seoul, Republic of Korea

(2) Department of Biomedical Informatics, Korea University College of Medicine, 46, Gaeunsa 2-gil, Seongbuk-gu, Seoul, Republic of Korea

(3) Department of Electronics Computer Engineering, Seokyeong University, 124, Seogyeong-ro Seongbuk-gu, Seoul, Republic of Korea

Fulltext View | Download

How to cite (IJASEIT) :

[1]

J. Kang, H. Lee, and D. Lee, “Pod Placement Techniques to Avoid Job Failures Due to Low GPU Memory in a Kubernetes Environment with Shared GPUs”, Int. J. Adv. Sci. Eng. Inf. Technol., vol. 14, no. 5, pp. 1626–1632, Oct. 2024.

Citation Format :

In a container-based cloud environment, GPUs have the advantage of providing high-performance computation to multiple users, and through GPU sharing, many GPU container users can be accommodated over the number of physical GPUs. This increases resource utilization and minimizes idle time. However, extended resources used to share GPUs in Kubernetes do not partition GPU resources or limit usage and only logically increase the number of GPUs that can be recognized. Therefore, usage limits and equal use of GPU resources cannot be guaranteed among pods sharing GPUs. Additionally, GPU memory generally does not allow for overuse. As a result, if a pod with high GPU memory usage runs a GPU task, it will be unable to limit GPU memory usage and free up available GPU memory. Data must be loaded into the GPU memory to perform the GPU task. However, if the data to be used for computation cannot be loaded into the GPU memory due to insufficient GPU memory, the pod will not be able to start the task and will fail to execute. This paper proposes a pod placement technique to avoid GPU memory shortage when sharing GPUs between pods in the Kubernetes environment. The proposed technique monitors the GPU memory usage and usage frequency of each worker node that makes up the cluster and places the pod on the worker node with the most available GPU memory based on the monitoring information.

Docker, Docker engine [Online]. Available: https://docs.docker.com/engine/

CUDA C Programming Guide, NVIDIA Corporation, CA, USA, 2024.

Linux Foundation, Kubernetes, [Online]. Available: https://kubernetes.io/docs/setup/

CUDA API Reference Manual, CA, USA, pp. 59, 2012.

M. Abadi et al., TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, arXiv preprint arXiv:1603.04467, 2016.

TensorFlow, Use GPU, [Online]. Available: https://www.tensorflow.org/guide/gpu.

T.-A. Yeh, H.-H. Chen, and J. Chou, “KubeShare: A Framework to Manage GPUs as First-Class and Shared Resources in Container Cloud,” Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing, vol. 2014, pp. 173–184, Jun. 2020, doi: 10.1145/3369583.3392679.

I. Harichane, S. A. Makhlouf, and G. Belalem, “KubeSC‐RTP: Smart scheduler for Kubernetes platform on CPU‐GPU heterogeneous systems,” Concurrency and Computation: Practice and Experience, vol. 34, no. 21, Jun. 2022, doi: 10.1002/cpe.7108.

G. El Haj Ahmed, F. Gil‐Castiñeira, and E. Costa‐Montenegro, “KubCG: A dynamic Kubernetes scheduler for heterogeneous clusters,” Software: Practice and Experience, vol. 51, no. 2, pp. 213–234, Sep. 2020, doi: 10.1002/spe.2898.

P. Thinakaran, J. R. Gunasekaran, B. Sharma, M. T. Kandemir, and C. R. Das, “Kube-Knots: Resource Harvesting through Dynamic Container Orchestration in GPU-based Datacenters,” 2019 IEEE International Conference on Cluster Computing (CLUSTER), Sep. 2019, doi: 10.1109/cluster.2019.8891040.

S. Wang et al., “An Efficient and Non-Intrusive GPU Scheduling Framework for Deep Learning Training Systems,” SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–13, Nov. 2020, doi: 10.1109/sc41405.2020.00094.

N. Zhou et al., “Container orchestration on HPC systems through Kubernetes,” Journal of Cloud Computing, vol. 10, no. 1, Feb. 2021, doi: 10.1186/s13677-021-00231-z.

J. Shi, D. Chen, J. Liang, L. Li, Y. Lin, and J. Li, “New YARN sharing GPU based on graphics memory granularity scheduling,” Parallel Computing, vol. 117, p. 103038, Sep. 2023, doi: 10.1016/j.parco.2023.103038.

T.-T. Hsieh and C.-R. Lee, “Voda: A GPU Scheduling Platform for Elastic Deep Learning in Kubernetes Clusters,” 2023 IEEE International Conference on Cloud Engineering (IC2E), vol. 27, pp. 131–140, Sep. 2023, doi: 10.1109/ic2e59103.2023.00023.

H. Albahar, S. Dongare, Y. Du, N. Zhao, A. K. Paul, and A. R. Butt, “SchedTune: A Heterogeneity-Aware GPU Scheduler for Deep Learning,” 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), May 2022, doi: 10.1109/ccgrid54584.2022.00079.

G. Yeung, D. Borowiec, R. Yang, A. Friday, R. Harper, and P. Garraghan, “Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 1, pp. 88–100, Jan. 2022, doi: 10.1109/tpds.2021.3079202.

J. Gu, Y. Zhu, P. Wang, M. Chadha, and M. Gerndt, “FaST-GShare: Enabling Efficient Spatio-Temporal GPU Sharing in Serverless Computing for Deep Learning Inference,” Proceedings of the 52nd International Conference on Parallel Processing, pp. 635–644, Aug. 2023, doi: 10.1145/3605573.3605638.

Z. Liu, C. Chen, J. Li, Y. Cheng, Y. Kou, and D. Zhang, “KubFBS: A fine‐grained and balance‐aware scheduling system for deep learning tasks based on kubernetes,” Concurrency and Computation: Practice and Experience, vol. 34, no. 11, Jan. 2022, doi: 10.1002/cpe.6836.

I. Harichane, S. A. Makhlouf, and G. Belalem, “A Proposal of Kubernetes Scheduler Using Machine-Learning on CPU/GPU Cluster,” Intelligent Algorithms in Software Engineering, pp. 567–580, 2020, doi: 10.1007/978-3-030-51965-0_50.

L. Liu, J. Yu, and Z. Ding, “Adaptive and Efficient GPU Time Sharing for Hyperparameter Tuning in Cloud,” Proceedings of the 51st International Conference on Parallel Processing, pp. 1–11, Aug. 2022, doi: 10.1145/3545008.3545027.

J. Lou, Y. Sun, J. Zhang, H. Cao, Y. Zhang, and N. Sun, “ArkGPU: enabling applications’ high-goodput co-location execution on multitasking GPUs,” CCF Transactions on High Performance Computing, vol. 5, no. 3, pp. 304–321, May 2023, doi: 10.1007/s42514-023-00154-y.

W. Shen, Z. Liu, Y. Tan, Z. Luo, and Z. Lei, “KubeGPU: efficient sharing and isolation mechanisms for GPU resource management in container cloud,” The Journal of Supercomputing, vol. 79, no. 1, pp. 591–625, Jul. 2022, doi: 10.1007/s11227-022-04682-2.

M. Zhao, K. Jha, and S. Hong, “GPU-enabled Function-as-a-Service for Machine Learning Inference,” 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS), vol. 11, pp. 918–928, May 2023, doi: 10.1109/ipdps54959.2023.00096.

Q. Weng, L. Yang, Y. Yu, W. Wang, X. Tang, G. Yang, et al., "Beware of Fragmentation: Scheduling GPU-Sharing Workloads with Fragmentation Gradient Descent", 2023 USENIX Annual Technical Conference (USENIX ATC 23), pp. 995-1008, 2023, [online] Available: https://www.usenix.org/conference/atc23/presentation/weng.

D. Jorge-Martinez et al., “Artificial intelligence-based Kubernetes container for scheduling nodes of energy composition,” International Journal of System Assurance Engineering and Management, Jul. 2021, doi: 10.1007/s13198-021-01195-8.

M. Saravanan and R. Vignesh, “DSTS: A hybrid optimal and deep reinforcement learning for dynamic scalable task scheduling on container cloud environment,” Mar. 2022, doi: 10.21203/rs.3.rs-1431790/v1.

Y. Mao et al., “Differentiate Quality of Experience Scheduling for Deep Learning Inferences With Docker Containers in the Cloud,” IEEE Transactions on Cloud Computing, vol. 11, no. 2, pp. 1667–1677, Apr. 2023, doi: 10.1109/tcc.2022.3154117.

A. Zou, J. Li, C. D. Gill, and X. Zhang, “RTGPU: Real-Time GPU Scheduling of Hard Deadline Parallel Tasks With Fine-Grain Utilization,” IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 5, pp. 1450–1465, May 2023, doi: 10.1109/tpds.2023.3235439.

Z. Chen, X. Zhao, C. Zhi, and J. Yin, “DeepBoot: Dynamic Scheduling System for Training and Inference Deep Learning Tasks in GPU Cluster,” IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 9, pp. 2553–2567, Sep. 2023, doi: 10.1109/tpds.2023.3293835.

J. Kennedy, V. Sharma, B. Varghese, and C. Reaño, “Multi-Tier GPU Virtualization for Deep Learning in Cloud-Edge Systems,” IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 7, pp. 2107–2123, Jul. 2023, doi: 10.1109/tpds.2023.3274957.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution LicenseÂ that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (SeeÂ The Effect of Open Access).