Skip to main content

Azure AI

Introduction

Mistral AI's open and commercial models can be deployed on the Microsoft Azure AI cloud platform in two ways:

  • Pay-as-you-go managed services: Using Model-as-a-Service (MaaS) serverless API deployments billed on endpoint usage. No GPU capacity quota is required for deployment.

  • Real-time endpoints: With quota-based billing tied to the underlying GPU infrastructure you choose to deploy.

This page focuses on the MaaS offering, where the following models are available:

  • Mistral Large
  • Mistral Small
  • Mistral NeMo

For more details, visit the models page.

Getting started

The following sections outline the steps to deploy and query a Mistral model on the Azure AI MaaS platform.

Deploying the model

Follow the instructions on the Azure documentation to create a new deployment for the model of your choice. Once deployed, take note of its corresponding URL and secret key.

Querying the model

Deployed endpoints expose a REST API that you can query using Mistral's SDKs or plain HTTP calls.

To run the examples below, set the following environment variables:

  • AZUREAI_ENDPOINT: Your endpoint URL, should be of the form https://your-endpoint.inference.ai.azure.com/v1/chat/completions.
  • AZUREAI_API_KEY: Your secret key.
curl --location $AZUREAI_ENDPOINT/v1/chat/completions \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $AZURE_API_KEY" \
--data '{
"model": "azureai",
"messages": [
{
"role": "user",
"content": "Who is the best French painter? Answer in one short sentence."
}
]
}'

Going further

For more details and examples, refer to the following resources: