Re: LLM training on openshift

Tengfei · ‎07-20-2023

LLM training involves a great number of GPUs, how to integrate, manage and allocate the GPU resources on openshift? Any one has any experience, please share here to discuss......

Chetan_Tiwary_ · ‎07-20-2023

Hello @Tengfei !

Could you please elaborate a little about this issue ? What is this LLM openshift training ?

Tengfei · ‎07-20-2023

It is AI large Language Model training ... which is the most popular research area on AI. It requires a great scale of GPU memory and GPU calculation resources. So how could we manage these resource to meet the LLM training calculation resource requirement and improve the GPU efficiency within and between AI calculation servers?