Connect to a Managed Inference endpoint (vLLM)

Send a chat-completion request to a Managed Inference endpoint from the CLI or an OpenAI-compatible client.

Send a chat-completion request to a Managed Inference endpoint from the CLI or any OpenAI-compatible client.

Prerequisites

You need the following before you start:

A running Managed Inference Job with a serving endpoint. See Create a Managed Inference Job.
Your Managed Inference API key, if the endpoint requires an authorization header. See Create an API key.
The CosmicAC CLI installed and configured, for the CLI method. See Install the CLI.

Steps

Find your endpoint name

List your endpoints:

cosmicac models healthcheck

Each endpoint appears as Endpoint: <endpoint-name>. Copy the one you want to call.

Send a request

Use the CLI or any OpenAI-compatible client. Replace <endpoint-name> with the endpoint name from the previous step, and <api-key> with the key you created.

You need the API key only if you enabled Require Authorization header when you created the job. Otherwise, omit --api-key and the Authorization header.

cosmicac inference chat \
  --endpoint-id <endpoint-name> \
  --api-key <api-key> \
  --message "Hello"

The CLI reads the inference URL from your config. Omit --message for an interactive session, or add --stream for streaming output.

Connect to a Managed Inference endpoint (vLLM)

Prerequisites

Steps

Find your endpoint name

Send a request

Next steps

On this page