Speech Generation API

Beta Access: The Speech API is currently in beta and available only to whitelisted users. Please contact support to request access.

The Demeterics Speech API provides a unified Text-to-Speech (TTS) interface across multiple providers. Convert text to natural-sounding audio with a single API while automatically tracking usage, costs, and storing generated audio for analysis.

Overview

Base URL: https://api.demeterics.com/tts/v1

Features:

Unified API: Single endpoint for OpenAI, ElevenLabs, Google Cloud TTS, and Murf.ai
Auto-tracking: Every request logged to BigQuery with full observability
Audio Storage: Generated audio stored in GCS with 15-minute signed URLs
BYOK Support: Use your own provider API keys with dual-key authentication
Cost Control: Automatic credit billing with 15% managed or 10% BYOK fee

Authentication

Managed Keys (Default)

Use only your Demeterics API key:

curl -X POST https://api.demeterics.com/tts/v1/generate \
  -H "Authorization: Bearer dmt_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{...}'

Bring Your Own Key (BYOK)

Use the dual-key format to provide your own provider API key:

curl -X POST https://api.demeterics.com/tts/v1/generate \
  -H "Authorization: Bearer dmt_your_api_key;sk-your_openai_key" \
  -H "Content-Type: application/json" \
  -d '{...}'

The format is: [demeterics_api_key];[provider_api_key]

BYOK Benefits:

10% service fee instead of 15%
Use your own rate limits and quotas
Provider costs billed directly to your account

Endpoints

Generate Speech

POST /tts/v1/generate

Convert text to speech audio.

Request Body:

Field	Type	Required	Description
`provider`	string	Yes	Target provider: `openai`, `elevenlabs`, `google`, `murf`
`model`	string	No	TTS model (provider-specific)
`voice`	string	No	Voice identifier
`input`	string	Yes	Text to convert (max varies by provider)
`format`	string	No	Output format: `mp3`, `wav`, `opus`, `flac`
`speed`	float	No	Playback speed: 0.25-4.0 (default: 1.0)
`language`	string	No	Language code (ISO 639-1)

Example Request:

curl -X POST https://api.demeterics.com/tts/v1/generate \
  -H "Authorization: Bearer dmt_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "tts-1",
    "voice": "alloy",
    "input": "Hello, welcome to Demeterics!",
    "format": "mp3"
  }'

Response:

{
  "id": "01JARV4HZ6XPQMWVCS9N1GKEFD",
  "provider": "openai",
  "model": "tts-1",
  "voice": "alloy",
  "audio_url": "https://storage.googleapis.com/demeterics-data/tts/...",
  "duration_seconds": 2.3,
  "cost_usd": 0.00023,
  "usage": {
    "input_chars": 31
  },
  "metadata": {
    "format": "mp3",
    "sample_rate": 24000,
    "channels": 1,
    "generation_ms": 450
  }
}

List Voices

GET /tts/v1/voices?provider={provider}

List available voices for a provider.

Query Parameters:

Parameter	Type	Required	Description
`provider`	string	Yes	Provider: `openai`, `elevenlabs`, `google`, `murf`

Example Request:

curl -X GET "https://api.demeterics.com/tts/v1/voices?provider=openai" \
  -H "Authorization: Bearer dmt_your_api_key"

Response:

{
  "voices": [
    {
      "id": "alloy",
      "name": "Alloy",
      "description": "Neutral and balanced",
      "gender": "neutral"
    },
    {
      "id": "echo",
      "name": "Echo",
      "description": "Clear and articulate",
      "gender": "male"
    }
  ]
}

Providers

OpenAI

Models:

gpt-4o-mini-tts - Latest model with better steerability (~85% cheaper than ElevenLabs)
tts-1 - Fast and efficient (legacy)
tts-1-hd - Higher quality (legacy)

Voices:

alloy - Neutral and balanced
ash - Warm and conversational
ballad - Soft and melodic
coral - Friendly and approachable
echo - Clear and articulate
fable - Expressive and dynamic
onyx - Deep and authoritative
nova - Friendly and warm
sage - Calm and measured
shimmer - Bright and optimistic
verse - Dynamic and engaging

Supported Formats: mp3, opus, aac, flac, wav, pcm

Max Characters: 4,096

ElevenLabs

Models:

eleven_multilingual_v2 - Best quality, 29 languages
eleven_turbo_v2_5 - Fast, English-optimized
eleven_turbo_v2 - Previous fast model
eleven_monolingual_v1 - English only

Voices: Over 100 pre-made voices plus custom voice cloning

Supported Formats: mp3, pcm, ulaw

Max Characters: 5,000

Google Cloud TTS

Models:

standard - Basic quality
neural2 - Neural network based
wavenet - High quality WaveNet
journey - Conversational style
studio - Professional quality

Voices: 220+ voices across 40+ languages

Supported Formats: mp3, wav, ogg

Max Characters: 5,000

Murf.ai

Models:

GEN2 - Latest generation, highest quality
FALCON - Fast streaming model

Voices: 120+ voices across 20+ languages including:

en-US-natalie - Natalie (US English, female)
en-US-miles - Miles (US English, male)
en-US-julia - Julia (US English, female)
en-UK-iris - Iris (UK English, female)
es-ES-elena - Elena (Spanish, female)
fr-FR-claire - Claire (French, female)
de-DE-anna - Anna (German, female)

Supported Formats: mp3, wav, flac, ogg, pcm, alaw, ulaw

Max Characters: 10,000

Features:

Voice styles (conversational, newscast, etc.)
Speed and pitch control
Multi-language support with native locales

Pricing

Managed Keys

Character-based pricing with 15% service fee:

Provider	Model	Cost per 1M chars
OpenAI	gpt-4o-mini-tts	$0.69
OpenAI	tts-1	$17.25
OpenAI	tts-1-hd	$34.50
ElevenLabs	eleven_multilingual_v2	$345.00
ElevenLabs	eleven_turbo_v2_5	$86.25
Google	wavenet	$18.40
Google	neural2	$18.40
Google	standard	$4.60
Murf	GEN2	$27.60
Murf	FALCON	$23.00

BYOK

10% service fee on top of provider costs. Provider costs billed directly to your account.

Error Handling

Error Response Format:

{
  "error": {
    "type": "invalid_request",
    "message": "Input text exceeds maximum length",
    "code": "text_too_long"
  }
}

Common Error Codes:

Code	HTTP Status	Description
`invalid_provider`	400	Unknown provider specified
`invalid_voice`	400	Voice not available for provider
`text_too_long`	400	Input exceeds provider limit
`insufficient_credits`	402	Not enough credits
`provider_error`	502	Provider API failed
`rate_limited`	429	Too many requests

Data Tracking

Every speech generation is automatically tracked in BigQuery with:

Transaction ID (ULID)
User and API key identifiers
Provider, model, and voice used
Input character count and text hash (privacy-safe)
Audio duration and format
GCS storage path
Cost breakdown (provider cost, service fee, total)
Latency metrics
Error information (if failed)

Query your speech generations:

SELECT
  transaction_id,
  provider,
  model,
  tts.voice,
  tts.input_chars,
  tts.duration_sec,
  total_cost
FROM `demeterics.demeterics.interactions`
WHERE interaction_type = 'tts'
  AND user_id = @user_id
  AND timing.question_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
ORDER BY timing.question_time DESC

SDK Support

Python

import requests

response = requests.post(
    "https://api.demeterics.com/tts/v1/generate",
    headers={"Authorization": "Bearer dmt_your_api_key"},
    json={
        "provider": "openai",
        "voice": "alloy",
        "input": "Hello, world!",
        "format": "mp3"
    }
)

audio_url = response.json()["audio_url"]

Node.js

const response = await fetch("https://api.demeterics.com/tts/v1/generate", {
  method: "POST",
  headers: {
    "Authorization": "Bearer dmt_your_api_key",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    provider: "openai",
    voice: "alloy",
    input: "Hello, world!",
    format: "mp3"
  })
});

const { audio_url } = await response.json();

Best Practices

Choose the right provider: OpenAI for speed, ElevenLabs for quality, Google for language coverage
Cache audio: Store frequently-used audio locally to reduce API calls
Use appropriate formats: MP3 for web, WAV for editing, Opus for streaming
Monitor costs: Track usage in your Demeterics dashboard
Handle errors gracefully: Implement retry logic with exponential backoff