> Model initialized in 142ms.
> Context cleared. Ready for input.
_
Next generation inference playground
By continuing, you agree to our Terms of Service.
Review your API usage, plan limits, and management tools.
Rate Limit
1 req/min
Hard Limit
Daily Token Quota
2,048 / 4,096
Daily Requests
450 / 1,000
Current Plan
BASIC
Free Tier
No API key created yet
| Timestamp | Tokens | Latency | Status |
|---|---|---|---|
| 2026-01-12 13:21 | 1,420 | 156ms | Success |
| 2026-01-12 13:14 | 890 | 142ms | Success |
> Model initialized in 142ms.
> Context cleared. Ready for input.
_
Explore creations from the Zephyr API community
"Generate a detailed prompt for a minimalist brutalist building..."
"Optimize this regex pattern for performance..."
"Create a smooth 3D card rotation animation in pure CSS..."
Choose a plan that fits your needs. Scale as you grow.
Perfect for testing and personal projects
Includes
Ideal for startups and growing businesses
Includes
Enterprise-grade for unlimited scale
Includes
Maximum power with premium AI models
Includes
Tailored solutions for your scale
Includes
Zephyr API is a RESTful interface to access a wide diversity of state-of-the-art language models. Designed to deliver Pro-level reasoning at Flash speeds with low cost, it's ideal for agentic workflows, multi-turn chat, and coding assistance.
All API requests should be made to the following base URL:
https://zephyr-api.matiasvergara.workers.dev/
Zephyr API offers a wide range of state-of-the-art models tailored to different performance and cost needs.
Available with regular API usage limits based on your plan.
| Model ID | Description | Plan |
|---|---|---|
| gemini-3-flash | Gemini 3.0 Flash (Flagship) | Basic+ |
| gemini-2.5-flash-lite | Gemini 2.5 Flash Lite | Basic+ |
| gpt-5-nano | GPT-5 Nano | Basic+ |
| xai-grok-4-fast | xAI Grok 4 Fast | Pro+ |
| minimax-m2.1 | MiniMax M2.1 | Pro+ |
| glm-4.7 | GLM 4.7 | Mega+ |
| gpt-oss-120b | GPT-OSS 120B | Mega+ |
| deepseek-v3.2-thinking | DeepSeek V3.2 with Thinking | Mega |
Enterprise Exclusive — These state-of-the-art models are only available with the Enterprise plan.
Daily Limit: Premium models have a limit of 100 requests/day. Usage resets at midnight UTC.
| Model ID | Description | Plan |
|---|---|---|
| gpt-5.2 | OpenAI GPT-5.2 | Enterprise |
| gemini-3-pro | Gemini 3 Pro | Enterprise |
| opus-4.5 | Claude Opus 4.5 | Enterprise |
| sonnet-4.5 | Claude Sonnet 4.5 | Enterprise |
| haiku-4.5 | Claude Haiku 4.5 | Enterprise |
| opus-4.5-reasoning | Claude Opus 4.5 with Reasoning | Enterprise |
Check which features are available in your plan.
| Feature | Basic | Pro | Mega | Enterprise | Custom |
|---|---|---|---|---|---|
| Visual Input (Images) | |||||
| Function Calling | |||||
| URL Context & Web Search (Gemini models) |
The Zephyr API uses API Keys to authenticate requests and track usage against your plan limits.
Include your API key in the
Authorization header of every
request:
Authorization: Bearer YOUR_API_KEY
curl -X POST https://zephyr-api.matiasvergara.workers.dev/generate \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"text": "Hello, world!"}'
Your API key is linked to your subscription plan. Usage limits include:
Usage resets at midnight UTC. Check the Console for your current usage statistics.
Endpoint: POST /generate
Generates text using the selected language model.
POST /generate
Content-Type: application/json
{
"text": "Your prompt here",
"temperature": 0.7,
"system": "You are a helpful assistant"
}
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text |
string | Yes | - | Your question or prompt |
temperature |
number | No | 0.7 | Controls creativity (0.0-1.0) |
system |
string | No | null | System prompt |
image |
string | No | null | Image URL for analysis |
url |
string | No | null | Web page URL for context |
stream |
boolean | No | false | Enable streaming |
history |
array | No | [] | Conversation history |
Endpoint: GET /health
Checks the API status.
GET /health
{
"success": true,
"model": "gemini-3-flash",
"message": "AI response here...",
"latency": 1.45,
"finish_reason": "STOP",
"usage": {
"prompt_tokens": 38,
"completion_tokens": 175,
"total_tokens": 213
},
"metadata": {
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2026-01-12T15:30:00.000Z",
"duration_ms": 1450,
"temperature_used": 0.7
}
}
{
"error": "API Error",
"details": "Error message here",
"status": 500,
"request_id": "550e8400-e29b-41d4-a716-446655440000"
}
| Code | Description |
|---|---|
| 200 | Success |
| 400 | Invalid parameters |
| 500 | Server error |
import requests
url = "https://zephyr-api.matiasvergara.workers.dev/generate"
response = requests.post(url, json={
"text": "What is artificial intelligence?",
"temperature": 0.7
})
print(response.json()["message"])
response = requests.post(url, json={
"text": "Explain quantum physics",
"system": "You are a physics professor. Use simple language.",
"temperature": 0.5
})
print(response.json()["message"])
response = requests.post(url, json={
"text": "Describe this image",
"image": "https://example.com/photo.jpg"
})
print(response.json()["message"])
response = requests.post(url, json={
"text": "Write a short story",
"stream": True
}, stream=True)
for line in response.iter_lines():
if line:
print(line.decode('utf-8'))
const response = await fetch("https://zephyr-api.matiasvergara.workers.dev/generate", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
text: "What is artificial intelligence?",
temperature: 0.7
})
});
const data = await response.json();
console.log(data.message);
const response = await fetch("https://zephyr-api.matiasvergara.workers.dev/generate", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
text: "Explain quantum physics",
system: "You are a physics professor.",
temperature: 0.5
})
});
console.log((await response.json()).message);
const response = await fetch(url, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ text: "Write a story", stream: true })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
console.log(decoder.decode(value));
}
curl -X POST https://zephyr-api.matiasvergara.workers.dev/generate \
-H "Content-Type: application/json" \
-d '{
"text": "What is artificial intelligence?",
"temperature": 0.7
}'
curl -X POST https://zephyr-api.matiasvergara.workers.dev/generate \
-H "Content-Type: application/json" \
-d '{
"text": "Explain quantum physics",
"system": "You are a physics professor.",
"temperature": 0.5
}'
curl https://zephyr-api.matiasvergara.workers.dev/health
Simple factual queries and information retrieval.
Stories, poems, and creative content generation.
Analyze and explain code snippets.
Describe and analyze images via URL.
Summarize web pages and articles.
Conversational AI with history context.
Controls randomness in responses:
| Range | Behavior |
|---|---|
| 0.0 - 0.3 | Deterministic, factual, consistent |
| 0.4 - 0.7 | Balanced (default: 0.7) |
| 0.8 - 1.0 | Creative, varied, unpredictable |
Customize AI behavior with different personas:
# Professional tone
system = "You are a university professor"
# Casual tone
system = "You are talking to a 10-year-old"
# Technical tone
system = "You are a technical documentation writer"
Use 0.3 for facts, 0.7 for balanced, 0.9 for creative tasks.
Be specific: "Provide 5 facts about cats focusing on behavior" vs "Tell me about cats"
Define the AI's role and tone for consistent responses.
Always wrap API calls in try/catch with proper timeout handling.
Enable streaming for stories, articles, and long content.
Operational
https://zephyr-api.matiasvergara.workers.dev/