Recommended model parameters
Recommended serving parameters, environment variables, and hardware for supported Managed Inference models.
CosmicAC recommends these serving parameters, environment variables, and hardware for the following models. The Job configuration reference defines each parameter. For how to apply them, see Create a Managed Inference Job.
The runtime image is part of the model master but does not yet affect serving.
Qwen3-VL-235B-A22B-Thinking-FP8
Model ID Qwen/Qwen3-VL-235B-A22B-Thinking-FP8.
Serving parameters
| Parameter | Value |
|---|---|
| Runtime image | vLLM 0.11.2 + CUDA 12.9 |
| Data type | Auto |
| Quantisation | None |
| Tensor parallel | 8 |
| GPU memory utilization | 0.9 |
| Max model length | 27000 |
| Max concurrent sequences | 256 |
| Reasoning parser | No parser |
| Video & image input | Yes |
| Root disk size | 500 GB |
Environment variables
TRUST_REMOTE_CODE=true
ENABLE_EXPERT_PARALLEL=true
ENFORCE_EAGER=true
SWAP_SPACE=0Hardware
| Resource | Value |
|---|---|
| GPUs | 8 H100 80 GB |
| CPU cores per GPU | 16 |
| RAM per GPU | 150 GB |
MiniMax M2.5
Model ID MiniMaxAI/MiniMax-M2.5.
Serving parameters
| Parameter | Value |
|---|---|
| Runtime image | vLLM 0.11.2 + CUDA 12.9 |
| Data type | Auto |
| Quantisation | None |
| Tensor parallel | 4 |
| GPU memory utilization | 0.85 |
| Max model length | 27000 |
| Max concurrent sequences | 256 |
| Reasoning parser | No parser |
| Video & image input | Yes |
| Root disk size | 500 GB |
Environment variables
TRUST_REMOTE_CODE=true
ENABLE_EXPERT_PARALLEL=true
ENFORCE_EAGER=true
SWAP_SPACE=0Hardware
| Resource | Value |
|---|---|
| GPUs | 4 H100 80 GB |
| CPU cores per GPU | 16 |
| RAM per GPU | 150 GB |