Speech Generation API
Beta Access: The Speech API is currently in beta and available only to whitelisted users. Please contact support to request access.
The Demeterics Speech API provides a unified Text-to-Speech (TTS) interface across multiple providers. Convert text to natural-sounding audio with a single API while automatically tracking usage, costs, and storing generated audio for analysis.
Overview
Base URL: https://api.demeterics.com/tts/v1
Features:
- Unified API: Single endpoint for OpenAI, ElevenLabs, Google Cloud TTS, and Murf.ai
- Auto-tracking: Every request logged to BigQuery with full observability
- Audio Storage: Generated audio stored in GCS with 15-minute signed URLs
- BYOK Support: Use your own provider API keys with dual-key authentication
- Cost Control: Automatic credit billing with 15% managed or 10% BYOK fee
Authentication
Managed Keys (Default)
Use only your Demeterics API key:
curl -X POST https://api.demeterics.com/tts/v1/generate \
-H "Authorization: Bearer dmt_your_api_key" \
-H "Content-Type: application/json" \
-d '{...}'
Bring Your Own Key (BYOK)
Use the dual-key format to provide your own provider API key:
curl -X POST https://api.demeterics.com/tts/v1/generate \
-H "Authorization: Bearer dmt_your_api_key;sk-your_openai_key" \
-H "Content-Type: application/json" \
-d '{...}'
The format is: [demeterics_api_key];[provider_api_key]
BYOK Benefits:
- 10% service fee instead of 15%
- Use your own rate limits and quotas
- Provider costs billed directly to your account
Endpoints
Generate Speech
POST /tts/v1/generate
Convert text to speech audio.
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
provider |
string | Yes | Target provider: openai, elevenlabs, google, murf |
model |
string | No | TTS model (provider-specific) |
voice |
string | No | Voice identifier |
input |
string | Yes | Text to convert (max varies by provider) |
format |
string | No | Output format: mp3, wav, opus, flac |
speed |
float | No | Playback speed: 0.25-4.0 (default: 1.0) |
language |
string | No | Language code (ISO 639-1) |
Example Request:
curl -X POST https://api.demeterics.com/tts/v1/generate \
-H "Authorization: Bearer dmt_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "tts-1",
"voice": "alloy",
"input": "Hello, welcome to Demeterics!",
"format": "mp3"
}'
Response:
{
"id": "01JARV4HZ6XPQMWVCS9N1GKEFD",
"provider": "openai",
"model": "tts-1",
"voice": "alloy",
"audio_url": "https://storage.googleapis.com/demeterics-data/tts/...",
"duration_seconds": 2.3,
"cost_usd": 0.00023,
"usage": {
"input_chars": 31
},
"metadata": {
"format": "mp3",
"sample_rate": 24000,
"channels": 1,
"generation_ms": 450
}
}
List Voices
GET /tts/v1/voices?provider={provider}
List available voices for a provider.
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
provider |
string | Yes | Provider: openai, elevenlabs, google, murf |
Example Request:
curl -X GET "https://api.demeterics.com/tts/v1/voices?provider=openai" \
-H "Authorization: Bearer dmt_your_api_key"
Response:
{
"voices": [
{
"id": "alloy",
"name": "Alloy",
"description": "Neutral and balanced",
"gender": "neutral"
},
{
"id": "echo",
"name": "Echo",
"description": "Clear and articulate",
"gender": "male"
}
]
}
Providers
OpenAI
Models:
gpt-4o-mini-tts- Latest model with better steerability (~85% cheaper than ElevenLabs)tts-1- Fast and efficient (legacy)tts-1-hd- Higher quality (legacy)
Voices:
alloy- Neutral and balancedash- Warm and conversationalballad- Soft and melodiccoral- Friendly and approachableecho- Clear and articulatefable- Expressive and dynamiconyx- Deep and authoritativenova- Friendly and warmsage- Calm and measuredshimmer- Bright and optimisticverse- Dynamic and engaging
Supported Formats: mp3, opus, aac, flac, wav, pcm
Max Characters: 4,096
ElevenLabs
Models:
eleven_multilingual_v2- Best quality, 29 languageseleven_turbo_v2_5- Fast, English-optimizedeleven_turbo_v2- Previous fast modeleleven_monolingual_v1- English only
Voices: Over 100 pre-made voices plus custom voice cloning
Supported Formats: mp3, pcm, ulaw
Max Characters: 5,000
Google Cloud TTS
Models:
standard- Basic qualityneural2- Neural network basedwavenet- High quality WaveNetjourney- Conversational stylestudio- Professional quality
Voices: 220+ voices across 40+ languages
Supported Formats: mp3, wav, ogg
Max Characters: 5,000
Murf.ai
Models:
GEN2- Latest generation, highest qualityFALCON- Fast streaming model
Voices: 120+ voices across 20+ languages including:
en-US-natalie- Natalie (US English, female)en-US-miles- Miles (US English, male)en-US-julia- Julia (US English, female)en-UK-iris- Iris (UK English, female)es-ES-elena- Elena (Spanish, female)fr-FR-claire- Claire (French, female)de-DE-anna- Anna (German, female)
Supported Formats: mp3, wav, flac, ogg, pcm, alaw, ulaw
Max Characters: 10,000
Features:
- Voice styles (conversational, newscast, etc.)
- Speed and pitch control
- Multi-language support with native locales
Pricing
Managed Keys
Character-based pricing with 15% service fee:
| Provider | Model | Cost per 1M chars |
|---|---|---|
| OpenAI | gpt-4o-mini-tts | $0.69 |
| OpenAI | tts-1 | $17.25 |
| OpenAI | tts-1-hd | $34.50 |
| ElevenLabs | eleven_multilingual_v2 | $345.00 |
| ElevenLabs | eleven_turbo_v2_5 | $86.25 |
| wavenet | $18.40 | |
| neural2 | $18.40 | |
| standard | $4.60 | |
| Murf | GEN2 | $27.60 |
| Murf | FALCON | $23.00 |
BYOK
10% service fee on top of provider costs. Provider costs billed directly to your account.
Error Handling
Error Response Format:
{
"error": {
"type": "invalid_request",
"message": "Input text exceeds maximum length",
"code": "text_too_long"
}
}
Common Error Codes:
| Code | HTTP Status | Description |
|---|---|---|
invalid_provider |
400 | Unknown provider specified |
invalid_voice |
400 | Voice not available for provider |
text_too_long |
400 | Input exceeds provider limit |
insufficient_credits |
402 | Not enough credits |
provider_error |
502 | Provider API failed |
rate_limited |
429 | Too many requests |
Data Tracking
Every speech generation is automatically tracked in BigQuery with:
- Transaction ID (ULID)
- User and API key identifiers
- Provider, model, and voice used
- Input character count and text hash (privacy-safe)
- Audio duration and format
- GCS storage path
- Cost breakdown (provider cost, service fee, total)
- Latency metrics
- Error information (if failed)
Query your speech generations:
SELECT
transaction_id,
provider,
model,
tts.voice,
tts.input_chars,
tts.duration_sec,
total_cost
FROM `demeterics.demeterics.interactions`
WHERE interaction_type = 'tts'
AND user_id = @user_id
AND timing.question_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
ORDER BY timing.question_time DESC
SDK Support
Python
import requests
response = requests.post(
"https://api.demeterics.com/tts/v1/generate",
headers={"Authorization": "Bearer dmt_your_api_key"},
json={
"provider": "openai",
"voice": "alloy",
"input": "Hello, world!",
"format": "mp3"
}
)
audio_url = response.json()["audio_url"]
Node.js
const response = await fetch("https://api.demeterics.com/tts/v1/generate", {
method: "POST",
headers: {
"Authorization": "Bearer dmt_your_api_key",
"Content-Type": "application/json"
},
body: JSON.stringify({
provider: "openai",
voice: "alloy",
input: "Hello, world!",
format: "mp3"
})
});
const { audio_url } = await response.json();
Best Practices
- Choose the right provider: OpenAI for speed, ElevenLabs for quality, Google for language coverage
- Cache audio: Store frequently-used audio locally to reduce API calls
- Use appropriate formats: MP3 for web, WAV for editing, Opus for streaming
- Monitor costs: Track usage in your Demeterics dashboard
- Handle errors gracefully: Implement retry logic with exponential backoff