vLLM with CPU deployment with kServe

TonyD1 · ‎07-10-2025

I want to deploy an LLM model ( from huggingface) thst is already in S3 bucket (it is a small model, 1b) Did someone achieve a deployment of LLM with CPU in RHOAI using kServe? There is a vLLM CPU, but i get different errors. I don't have GPU in my environment

TudorRaduta · ‎07-11-2025

I found this article that might be helpful: https://docs.redhat.com/en/documentation/red_hat_openshift_ai_cloud_service/1/html/serving_models/se...

Adolfito · ‎08-24-2025

Hi Tony, Try this:
https://github.com/rh-aiservices-bu/llm-on-openshift/tree/main/llm-servers/vllm/cpu

Chetan_Tiwary_ · ‎08-24-2025

@TonyD1 Not sure if it is supported without GPUs. Please refer this : https://docs.redhat.com/en/documentation/red_hat_openshift_ai_cloud_service/1/html-single/serving_mo...

https://github.com/kserve/kserve/issues/3997

https://github.com/kserve/kserve/pull/4049

vLLM with CPU deployment with kServe

cpu

kserve

s3

vLLM