Zephyr API

Next generation inference playground

Password

By continuing, you agree to our Terms of Service.

Zephyr API

Console

Playground

Gallery

Pricing

Documentation

Account Dashboard

Review your API usage, plan limits, and management tools.

System Quotas

Rate Limit

1 req/min

Hard Limit

Daily Token Quota

2,048 / 4,096

Daily Requests

450 / 1,000

Current Plan

BASIC

Free Tier

Master API Key

Max: 1/1 Keys

Production Access Active

No API key created yet

Recent Inference Requests

Timestamp	Tokens	Latency	Status
2026-01-12 13:21	1,420	156ms	Success
2026-01-12 13:14	890	142ms	Success

Customer Support

Project Flash Workspace

GEMINI-3-FLASH :: READY

Online

> Model initialized in 142ms.

> Context cleared. Ready for input.

0 / 8k tokens

Configurations

Model Selection

Available models depend on your subscription plan.

Temperature0.85

Max Tokens2,048

Top P1.0

Inspiration Gallery

Explore creations from the Zephyr API community

Geometric Architecture

"Generate a detailed prompt for a minimalist brutalist building..."

2.4k views

Regex Master

"Optimize this regex pattern for performance..."

1.2k views

CSS Animation Studio

"Create a smooth 3D card rotation animation in pure CSS..."

850 views

Simple, Transparent Pricing

Choose a plan that fits your needs. Scale as you grow.

Basic

$0.00 /mo

Perfect for testing and personal projects

Tokens

4,096 / Day Hard Limit

Requests

1,000 / Day Hard Limit

Rate Limit 1 req/min

Context Window 32K tokens

Includes

Gemini 3.0 Flash

Gemini 2.5 Flash Lite

GPT-5 Nano

API Access

Visual Input (Images)

Function Calling

Community Discord Support

URL Context & Web Search

Priority Support

⭐ Recommended

Pro

$50.00 /mo

Ideal for startups and growing businesses

Tokens

100,000 / Day +$0.00025

Requests

10,000 / Day +$0.00025

Rate Limit 60 req/min

Context Window 128K tokens

Includes

xAI Grok 4 Fast

MiniMax M2.1

Everything in Basic

URL Context & Web Search

Priority Email Support

Function Calling

Mega

$175.00 /mo

Enterprise-grade for unlimited scale

Tokens

3,000,000 / Day +$0.00015/1K

Requests

100,000 / Day +$0.00015/req

Rate Limit 1000 req/min

Context Window 1M tokens

Model Access All + Beta Models

Includes

DeepSeek V3.2 + Thinking

Everything in Pro

Dedicated Support Manager

Premium

Enterprise

$355.00 /mo

Maximum power with premium AI models

Tokens 6,000,000 / Day

Requests 1,000,000 / Day

Rate Limit Unlimited RPS

Premium Models 100 / Day

Includes

Everything in Mega

GPT-5.2

Gemini 3 Pro

Claude Opus 4.5

Claude Sonnet 4.5

Claude Haiku 4.5

Opus 4.5 Reasoning

Custom

Contact

Tailored solutions for your scale

Tokens Fully Custom

Requests Fully Custom

Rate Limit Custom Range

Includes

All Enterprise features

Custom API Integration

Private Edge Deployment

Volume Discounts

Getting Started

API Reference

Code Examples

Reference

Zephyr API

Zephyr API is a RESTful interface to access a wide diversity of state-of-the-art language models. Designed to deliver Pro-level reasoning at Flash speeds with low cost, it's ideal for agentic workflows, multi-turn chat, and coding assistance.

Key Features

High speed and efficiency
Pro-level reasoning
Low operational cost
Multimodal support (text, images)
Temperature configuration
SSE streaming
Tool integration

Base URL

All API requests should be made to the following base URL:

https://zephyr-api.matiasvergara.workers.dev/

Model ID	Description	Plan
gemini-3-flash	Gemini 3.0 Flash (Flagship)	Basic+
gemini-2.5-flash-lite	Gemini 2.5 Flash Lite	Basic+
gpt-5-nano	GPT-5 Nano	Basic+
xai-grok-4-fast	xAI Grok 4 Fast	Pro+
minimax-m2.1	MiniMax M2.1	Pro+
glm-4.7	GLM 4.7	Mega+
gpt-oss-120b	GPT-OSS 120B	Mega+
deepseek-v3.2-thinking	DeepSeek V3.2 with Thinking	Mega

Model ID	Description	Plan
gpt-5.2	OpenAI GPT-5.2	Enterprise
gemini-3-pro	Gemini 3 Pro	Enterprise
opus-4.5	Claude Opus 4.5	Enterprise
sonnet-4.5	Claude Sonnet 4.5	Enterprise
haiku-4.5	Claude Haiku 4.5	Enterprise
opus-4.5-reasoning	Claude Opus 4.5 with Reasoning	Enterprise

Authentication

The Zephyr API uses API Keys to authenticate requests and track usage against your plan limits.

Getting Your API Key

Go to the Console section in the sidebar
Your API key will be displayed in the API Key card
Click the Copy button to copy your key

Using Your API Key

Include your API key in the Authorization header of every request:

Authorization: Bearer YOUR_API_KEY

Example Request

curl -X POST https://zephyr-api.matiasvergara.workers.dev/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"text": "Hello, world!"}'

Rate Limits & Usage

Your API key is linked to your subscription plan. Usage limits include:

Rate Limit: Requests per minute (varies by plan)
Daily Tokens: Maximum tokens per day
Daily Requests: Maximum requests per day
Premium Models (Enterprise): 100 requests/day for premium models

Usage resets at midnight UTC. Check the Console for your current usage statistics.

Endpoints

1. Text Generation

Endpoint: POST /generate

Generates text using the selected language model.

Request

POST /generate
Content-Type: application/json

{
  "text": "Your prompt here",
  "temperature": 0.7,
  "system": "You are a helpful assistant"
}

Parameters

Parameter	Type	Required	Default	Description
`text`	string	Yes	-	Your question or prompt
`temperature`	number	No	0.7	Controls creativity (0.0-1.0)
`system`	string	No	null	System prompt
`image`	string	No	null	Image URL for analysis
`url`	string	No	null	Web page URL for context
`stream`	boolean	No	false	Enable streaming
`history`	array	No	[]	Conversation history

2. Health Check

Endpoint: GET /health

Checks the API status.

GET /health

Response Format

Success Response

{
  "success": true,
  "model": "gemini-3-flash",
  "message": "AI response here...",
  "latency": 1.45,
  "finish_reason": "STOP",
  "usage": {
    "prompt_tokens": 38,
    "completion_tokens": 175,
    "total_tokens": 213
  },
  "metadata": {
    "request_id": "550e8400-e29b-41d4-a716-446655440000",
    "timestamp": "2026-01-12T15:30:00.000Z",
    "duration_ms": 1450,
    "temperature_used": 0.7
  }
}

Error Response

{
  "error": "API Error",
  "details": "Error message here",
  "status": 500,
  "request_id": "550e8400-e29b-41d4-a716-446655440000"
}

Status Codes

Code	Description
200	Success
400	Invalid parameters
500	Server error

Python Examples

Basic Usage

import requests

url = "https://zephyr-api.matiasvergara.workers.dev/generate"

response = requests.post(url, json={
    "text": "What is artificial intelligence?",
    "temperature": 0.7
})

print(response.json()["message"])

With System Prompt

response = requests.post(url, json={
    "text": "Explain quantum physics",
    "system": "You are a physics professor. Use simple language.",
    "temperature": 0.5
})

print(response.json()["message"])

With Image

response = requests.post(url, json={
    "text": "Describe this image",
    "image": "https://example.com/photo.jpg"
})

print(response.json()["message"])

Streaming

response = requests.post(url, json={
    "text": "Write a short story",
    "stream": True
}, stream=True)

for line in response.iter_lines():
    if line:
        print(line.decode('utf-8'))

JavaScript Examples

Basic Usage

const response = await fetch("https://zephyr-api.matiasvergara.workers.dev/generate", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    text: "What is artificial intelligence?",
    temperature: 0.7
  })
});

const data = await response.json();
console.log(data.message);

With System Prompt

const response = await fetch("https://zephyr-api.matiasvergara.workers.dev/generate", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    text: "Explain quantum physics",
    system: "You are a physics professor.",
    temperature: 0.5
  })
});

console.log((await response.json()).message);

Streaming

const response = await fetch(url, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ text: "Write a story", stream: true })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  console.log(decoder.decode(value));
}

cURL Examples

Basic Request

curl -X POST https://zephyr-api.matiasvergara.workers.dev/generate \
  -H "Content-Type: application/json" \
  -d '{
    "text": "What is artificial intelligence?",
    "temperature": 0.7
  }'

With System Prompt

curl -X POST https://zephyr-api.matiasvergara.workers.dev/generate \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Explain quantum physics",
    "system": "You are a physics professor.",
    "temperature": 0.5
  }'

Health Check

curl https://zephyr-api.matiasvergara.workers.dev/health

Configuration

Temperature

Controls randomness in responses:

Range	Behavior
0.0 - 0.3	Deterministic, factual, consistent
0.4 - 0.7	Balanced (default: 0.7)
0.8 - 1.0	Creative, varied, unpredictable

System Prompts

Customize AI behavior with different personas:

# Professional tone
system = "You are a university professor"

# Casual tone
system = "You are talking to a 10-year-old"

# Technical tone
system = "You are a technical documentation writer"

Feature	Basic	Pro	Mega	Enterprise	Custom
Visual Input (Images)
Function Calling
URL Context & Web Search (Gemini models)

Zephyr API

Account Dashboard

System Quotas

Master API Key

Recent Inference Requests

Customer Support

Inspiration Gallery

Geometric Architecture

Regex Master

CSS Animation Studio

Simple, Transparent Pricing

Basic

Pro

Mega

Enterprise

Custom

Zephyr API

Key Features

Base URL

Available Models

Standard Models

✨ Premium Models

Feature Availability

Authentication

Getting Your API Key

Using Your API Key

Example Request

Rate Limits & Usage

Endpoints

1. Text Generation

Request

Parameters

2. Health Check

Response Format

Success Response

Error Response

Status Codes

Python Examples

Basic Usage

With System Prompt

With Image

Streaming

JavaScript Examples

Basic Usage

With System Prompt

Streaming

cURL Examples

Basic Request

With System Prompt

Health Check

Use Cases

Question & Answer

Creative Writing

Code Explanation

Image Analysis

Web Summarization

Multi-turn Chat

Configuration

Temperature

System Prompts

Best Practices

1. Use Appropriate Temperature

2. Provide Clear Instructions

3. Use System Prompts for Consistency

4. Handle Errors Gracefully

5. Use Streaming for Long Responses

API Status