Connect to a Managed Inference endpoint (vLLM)
Send a chat-completion request to a Managed Inference endpoint from the CLI or an OpenAI-compatible client.
Send a chat-completion request to a Managed Inference endpoint from the CLI or any OpenAI-compatible client.
Prerequisites
You need the following before you start:
- A running Managed Inference Job with a serving endpoint. See Create a Managed Inference Job.
- Your Managed Inference API key, if the endpoint requires an authorization header. See Create an API key.
- The CosmicAC CLI installed and configured, for the CLI method. See Install the CLI.
Steps
Find your endpoint name
List your endpoints:
cosmicac models healthcheckEach endpoint appears as Endpoint: <endpoint-name>. Copy the one you want to call.
Send a request
Use the CLI or any OpenAI-compatible client. Replace <endpoint-name> with the endpoint name from the previous step, and <api-key> with the key you created.
You need the API key only if you enabled Require Authorization header when you created the job. Otherwise, omit --api-key and the Authorization header.
cosmicac inference chat \
--endpoint-id <endpoint-name> \
--api-key <api-key> \
--message "Hello"The CLI reads the inference URL from your config. Omit --message for an interactive session, or add --stream for streaming output.