Recommended model parameters

Recommended serving parameters, environment variables, and hardware for supported Managed Inference models.

CosmicAC recommends these serving parameters, environment variables, and hardware for the following models. The Job configuration reference defines each parameter. For how to apply them, see Create a Managed Inference Job.

The runtime image is part of the model master but does not yet affect serving.

Qwen3-VL-235B-A22B-Thinking-FP8

Model ID Qwen/Qwen3-VL-235B-A22B-Thinking-FP8.

Serving parameters

Parameter	Value
Runtime image	vLLM 0.11.2 + CUDA 12.9
Data type	Auto
Quantisation	None
Tensor parallel	8
GPU memory utilization	0.9
Max model length	27000
Max concurrent sequences	256
Reasoning parser	No parser
Video & image input	Yes
Root disk size	500 GB

Environment variables

TRUST_REMOTE_CODE=true
ENABLE_EXPERT_PARALLEL=true
ENFORCE_EAGER=true
SWAP_SPACE=0

Hardware

Resource	Value
GPUs	8 H100 80 GB
CPU cores per GPU	16
RAM per GPU	150 GB

MiniMax M2.5

Model ID MiniMaxAI/MiniMax-M2.5.

Serving parameters

Parameter	Value
Runtime image	vLLM 0.11.2 + CUDA 12.9
Data type	Auto
Quantisation	None
Tensor parallel	4
GPU memory utilization	0.85
Max model length	27000
Max concurrent sequences	256
Reasoning parser	No parser
Video & image input	Yes
Root disk size	500 GB

Environment variables

TRUST_REMOTE_CODE=true
ENABLE_EXPERT_PARALLEL=true
ENFORCE_EAGER=true
SWAP_SPACE=0

Hardware

Resource	Value
GPUs	4 H100 80 GB
CPU cores per GPU	16
RAM per GPU	150 GB

Job configuration reference

Previous Page

On this page

Qwen3-VL-235B-A22B-Thinking-FP8 Serving parameters Environment variables Hardware MiniMax M2.5 Serving parameters Environment variables Hardware