Data Extraction API

Learn how to integrate Demeterics into your workflows with step-by-step guides and API examples.

Data Extraction API

Demeterics stores all your LLM interactions in BigQuery and provides APIs to extract and analyze this data programmatically. This guide covers how to export interaction data, query usage metrics, and integrate with your analytics pipeline.


Overview

Your interaction data is stored in BigQuery and can be extracted via:

  1. Export API (POST /api/v1/exports) - Bulk export to JSON, CSV, or Avro
  2. Stream API (GET /api/v1/exports/{request_id}/stream) - Stream large datasets
  3. Dashboard UI - Download exports directly from the web interface

All exports are scoped to your user account and respect data retention policies.


Authentication

All export endpoints require authentication via your Demeterics API key:

Authorization: Bearer dmt_your_api_key

Your API key must have the export scope enabled. To check or update scopes, visit Settings → API Keys.


Export API

POST /api/v1/exports

Create a new data export job. Returns immediately with a request ID for streaming large datasets.

Request Body

Field Type Required Description
format string No Output format: json, csv, or avro. Default: csv
start_date string No Start date filter (ISO 8601: YYYY-MM-DD)
end_date string No End date filter (ISO 8601: YYYY-MM-DD)
tables array No Tables to export: interactions, eval_runs, eval_results. Default: all
use_gcs boolean No Export to GCS bucket instead of streaming. Default: false
gcs_bucket string No Target GCS bucket (required if use_gcs is true)

Example: Export last 30 days as JSON

curl -X POST https://api.demeterics.com/api/v1/exports \
  -H "Authorization: Bearer dmt_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "format": "json",
    "start_date": "2025-11-01",
    "end_date": "2025-11-30",
    "tables": ["interactions"]
  }'

Response

{
  "status": "ok",
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "row_count": 1542,
  "bytes_size": 2048576,
  "message": "Export ready for streaming"
}

GET /api/v1/exports/{request_id}/stream

Stream the exported data. Use the request_id from the export response.

Example: Stream as CSV

curl -X GET "https://api.demeterics.com/api/v1/exports/550e8400-e29b-41d4-a716-446655440000/stream" \
  -H "Authorization: Bearer dmt_your_api_key" \
  -o interactions.csv

Query Parameters

Parameter Description
format Override format: json or csv

Interaction Data Schema

Exported interactions include the following fields:

Field Type Description
transaction_id string Unique interaction identifier (ULID)
request_id string Client-provided request ID for idempotency
session_id string Session identifier for grouping conversations
user_id int64 Your Demeterics user ID
model string LLM model used (e.g., llama-3.3-70b-versatile)
question string Input prompt/question
question_time timestamp When the question was sent
answer string LLM response
answer_time timestamp When the answer was received
latency_ms int64 Response time in milliseconds
prompt_tokens int64 Input token count
completion_tokens int64 Output token count
cached_tokens int64 Cached token count (if applicable)
total_tokens int64 Total tokens used
estimated_cost float64 Estimated cost in USD
status string success, error, or timeout
error_message string Error details (if status is error)
application string Application name from API key
metadata json Custom metadata attached to the interaction
tags array Tags for categorization

Export Examples

Python: Export and Analyze

import requests
import pandas as pd
from io import StringIO

API_KEY = "dmt_your_api_key"
BASE_URL = "https://api.demeterics.com"

# Create export job
response = requests.post(
    f"{BASE_URL}/api/v1/exports",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "format": "csv",
        "start_date": "2025-11-01",
        "end_date": "2025-11-30",
        "tables": ["interactions"]
    }
)
export = response.json()
request_id = export["request_id"]

# Stream the data
stream_response = requests.get(
    f"{BASE_URL}/api/v1/exports/{request_id}/stream",
    headers={"Authorization": f"Bearer {API_KEY}"}
)

# Load into pandas
df = pd.read_csv(StringIO(stream_response.text))

# Analyze
print(f"Total interactions: {len(df)}")
print(f"Total cost: ${df['estimated_cost'].sum():.2f}")
print(f"Avg latency: {df['latency_ms'].mean():.0f}ms")
print(f"\nTop models:")
print(df['model'].value_counts().head())

Node.js: Stream to File

const fs = require('fs');
const https = require('https');

const API_KEY = 'dmt_your_api_key';

// Create export
fetch('https://api.demeterics.com/api/v1/exports', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    format: 'json',
    start_date: '2025-11-01',
    end_date: '2025-11-30'
  })
})
.then(res => res.json())
.then(data => {
  // Stream to file
  const file = fs.createWriteStream('interactions.json');
  https.get(
    `https://api.demeterics.com/api/v1/exports/${data.request_id}/stream`,
    { headers: { 'Authorization': `Bearer ${API_KEY}` } },
    response => response.pipe(file)
  );
});

Shell: Daily Export Script

#!/bin/bash
# daily_export.sh - Export yesterday's interactions

API_KEY="dmt_your_api_key"
YESTERDAY=$(date -d "yesterday" +%Y-%m-%d)
TODAY=$(date +%Y-%m-%d)
OUTPUT_FILE="interactions_${YESTERDAY}.csv"

# Create export
REQUEST_ID=$(curl -s -X POST https://api.demeterics.com/api/v1/exports \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"format\": \"csv\",
    \"start_date\": \"$YESTERDAY\",
    \"end_date\": \"$TODAY\",
    \"tables\": [\"interactions\"]
  }" | jq -r '.request_id')

# Download
curl -s "https://api.demeterics.com/api/v1/exports/$REQUEST_ID/stream" \
  -H "Authorization: Bearer $API_KEY" \
  -o "$OUTPUT_FILE"

echo "Exported to $OUTPUT_FILE"

GCS Export (Enterprise)

For large datasets, export directly to a Google Cloud Storage bucket:

curl -X POST https://api.demeterics.com/api/v1/exports \
  -H "Authorization: Bearer dmt_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "format": "avro",
    "use_gcs": true,
    "gcs_bucket": "gs://your-bucket/exports/",
    "start_date": "2025-01-01",
    "end_date": "2025-11-30"
  }'

Response

{
  "status": "ok",
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "url": "gs://your-bucket/exports/interactions_2025-11-30.avro",
  "expires_at": "2025-12-07T00:00:00Z",
  "row_count": 150000,
  "message": "Export complete"
}

Note: Contact support to enable GCS export for your account.


Rate Limits

Endpoint Limit
POST /api/v1/exports 10 requests/minute
GET /api/v1/exports/{id}/stream 100 requests/minute

Export jobs are cached for 10 minutes. Repeated requests with the same parameters will return the cached result.


Best Practices

  1. Use date filters - Always specify start_date and end_date to limit data volume
  2. Export incrementally - Run daily/weekly exports instead of full history dumps
  3. Use CSV for analysis - Easier to work with in spreadsheets and pandas
  4. Use Avro for pipelines - More efficient for BigQuery, Spark, or data warehouses
  5. Store exports - Export jobs expire after 10 minutes; save the data locally

Troubleshooting

401 Unauthorized

  • Check that your API key is valid
  • Ensure the key has export scope enabled

403 Forbidden

  • Your API key lacks the export scope
  • Update key permissions in Settings → API Keys

404 Not Found

  • Export request expired (10 minute TTL)
  • Re-create the export job

500 Internal Server Error

  • Date range may be too large
  • Try a smaller date range or specific tables