Vertex AI
Vertex AI hosts Gemini, Anthropic Claude, Meta Llama, and Mistral models on your own GCP project. Use it when you want Google-managed inference billed against your GCP account, regional control, or models that aren't on AI Studio.
| Type | Models |
|---|---|
vertex-google | Gemini models |
vertex-anthropic | Claude models |
vertex-meta | Llama models |
vertex-mistralai | Mistral models |
All four use the same auth and URL pattern. Pick the one that matches the model family you want.
Prerequisites
-
A GCP project with billing enabled.
-
The Vertex AI API enabled on that project:
gcloud services enable aiplatform.googleapis.com --project YOUR_PROJECT_ID -
A region — e.g.
us-central1,europe-west4. Each model is only available in some regions; check the Vertex AI model availability matrix.
The backend URL always follows this shape:
https://{REGION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{REGION}
Auth method 1 — Service account JSON (recommended for servers)
Create a service account with the roles/aiplatform.user role, download a JSON key, and load it through an env var:
export VERTEX_SA_JSON=$(cat /path/to/service-account.json)
contenox backend add vertex --type vertex-google \
--url "https://us-central1-aiplatform.googleapis.com/v1/projects/$GOOGLE_CLOUD_PROJECT/locations/us-central1" \
--api-key-env VERTEX_SA_JSON
contenox config set default-model gemini-2.5-flash
contenox config set default-provider vertex-google
Contenox reads the JSON from the named env var at request time, so the key never lands in the config file on disk.
Auth method 2 — Application Default Credentials (CLI / dev only)
ADC reuses your gcloud login. It's the fastest way to try Vertex but expires when the refresh token is revoked (see renewal below).
gcloud config set project YOUR_PROJECT_ID
gcloud services enable aiplatform.googleapis.com
gcloud auth application-default login
gcloud auth application-default set-quota-project YOUR_PROJECT_ID
contenox backend add vertex --type vertex-google \
--url "https://us-central1-aiplatform.googleapis.com/v1/projects/YOUR_PROJECT_ID/locations/us-central1"
contenox config set default-model gemini-2.5-flash
contenox config set default-provider vertex-google
Omit --api-key-env and Contenox falls back to ADC.
Important
set-quota-project is required. Without it, every Vertex AI call returns 403 SERVICE_DISABLED — even if you already ran gcloud config set project.
Renewing credentials
Vertex tokens are short-lived (~1 hour). Contenox refreshes them automatically on every request, so day-to-day you never deal with the access token itself — what you renew is the credential the token is minted from.
Symptom: vertex AI token refresh: oauth2: "invalid_grant"
This means ADC tried to mint a fresh access token from its refresh token and Google rejected the refresh token. Causes, most common first:
- You haven't run
gcloud auth application-default loginin a long time and the refresh token aged out (Google rotates them). - You changed your Google account password.
- The account's session was revoked (admin action, security policy, or you ran
gcloud auth application-default revoke). - The ADC file at
~/.config/gcloud/application_default_credentials.jsonwas deleted or replaced by a different account.
Fix:
gcloud auth application-default login
gcloud auth application-default set-quota-project YOUR_PROJECT_ID
No contenox restart needed — the next request picks up the new credentials. If you removed the ADC file by hand, also re-run set-quota-project.
Symptom: vertex AI service account token: ...
The service account JSON in VERTEX_SA_JSON is invalid, the key was disabled in GCP, or the service account was deleted. Generate a new key:
gcloud iam service-accounts keys create new-key.json \
--iam-account=vertex-runner@YOUR_PROJECT_ID.iam.gserviceaccount.com
export VERTEX_SA_JSON=$(cat new-key.json)
Then restart the shell / process that holds the env var so Contenox sees the new value.
Rotating a service account key on a schedule
Service account keys don't expire on their own — rotate them yourself. Typical pattern:
# Create a new key
gcloud iam service-accounts keys create new-key.json \
--iam-account=vertex-runner@YOUR_PROJECT_ID.iam.gserviceaccount.com
# Swap the env var (in your secret store / systemd unit / k8s secret)
export VERTEX_SA_JSON=$(cat new-key.json)
# Disable the old key once requests are flowing on the new one
gcloud iam service-accounts keys list \
--iam-account=vertex-runner@YOUR_PROJECT_ID.iam.gserviceaccount.com
gcloud iam service-accounts keys delete OLD_KEY_ID \
--iam-account=vertex-runner@YOUR_PROJECT_ID.iam.gserviceaccount.com
Troubleshooting
| Error | Cause | Fix |
|---|---|---|
oauth2: "invalid_grant" | ADC refresh token revoked / aged out | gcloud auth application-default login |
403 SERVICE_DISABLED | Quota project not set, or Vertex AI API not enabled | gcloud services enable aiplatform.googleapis.com and gcloud auth application-default set-quota-project ... |
403 PERMISSION_DENIED on a service account | Missing roles/aiplatform.user | Grant the role on the project |
404 on the model | Model not available in your region | Pick a region from the model availability matrix and recreate the backend with the right URL |
unreachable: vertex-google list models: ... in contenox model list | Same as invalid_grant above — the catalog fetch refreshes tokens too | Renew per the section above; the backend stays registered |
See also
- Google Gemini (AI Studio) — simpler API-key flow if you don't need GCP
- Configuration reference
- CLI reference:
backend add