API Documentation

All endpoints require an API key via the Authorization: Bearer header. Base URL: https://tts.convolity.com

POST/v1/tts/generate

Text to Speech

Generate speech from text. Language auto-detected.

curl -X POST https://tts.convolity.com/v1/tts/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello from ConvoTTS!"}' \
  --output speech.wav

Response: audio/wav (48kHz)

POST/v1/voice/design

Voice Design

Create a voice from a natural language description. No reference audio needed.

curl -X POST https://tts.convolity.com/v1/voice/design \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"description": "Young woman, warm", "text": "Hello!"}' \
  --output designed.wav

Response: audio/wav (48kHz)

POST/v1/voice/clone/controllable

Voice Cloning

Clone a voice from reference audio with optional style control.

# Encode reference audio to base64
REF=$(base64 -i reference.wav)

curl -X POST https://tts.convolity.com/v1/voice/clone/controllable \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{\"text\": \"Hello\", \"reference_audio\": \"$REF\"}" \
  --output cloned.wav

Response: audio/wav (48kHz)

POST/v1/tts/stream

Streaming TTS

Stream audio chunks as they generate.

curl -X POST https://tts.convolity.com/v1/tts/stream \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Streaming audio."}' \
  --output stream.wav

Response: audio/wav (chunked)

Rate Limits

• Playground (no API key): 10 minutes audio / day per IP, 60 requests / minute

• API (with key): 60 requests / minute, custom quotas available

• Audio output: 48kHz WAV

• Max text length: 5,000 characters per request