CosmicAC Logo

Recommended model parameters

Recommended serving parameters, environment variables, and hardware for supported Managed Inference models.

CosmicAC recommends these serving parameters, environment variables, and hardware for the following models. The Job configuration reference defines each parameter. For how to apply them, see Create a Managed Inference Job.

The runtime image is part of the model master but does not yet affect serving.

Qwen3-VL-235B-A22B-Thinking-FP8

Model ID Qwen/Qwen3-VL-235B-A22B-Thinking-FP8.

Serving parameters

ParameterValue
Runtime imagevLLM 0.11.2 + CUDA 12.9
Data typeAuto
QuantisationNone
Tensor parallel8
GPU memory utilization0.9
Max model length27000
Max concurrent sequences256
Reasoning parserNo parser
Video & image inputYes
Root disk size500 GB

Environment variables

TRUST_REMOTE_CODE=true
ENABLE_EXPERT_PARALLEL=true
ENFORCE_EAGER=true
SWAP_SPACE=0

Hardware

ResourceValue
GPUs8 H100 80 GB
CPU cores per GPU16
RAM per GPU150 GB

MiniMax M2.5

Model ID MiniMaxAI/MiniMax-M2.5.

Serving parameters

ParameterValue
Runtime imagevLLM 0.11.2 + CUDA 12.9
Data typeAuto
QuantisationNone
Tensor parallel4
GPU memory utilization0.85
Max model length27000
Max concurrent sequences256
Reasoning parserNo parser
Video & image inputYes
Root disk size500 GB

Environment variables

TRUST_REMOTE_CODE=true
ENABLE_EXPERT_PARALLEL=true
ENFORCE_EAGER=true
SWAP_SPACE=0

Hardware

ResourceValue
GPUs4 H100 80 GB
CPU cores per GPU16
RAM per GPU150 GB

On this page