Skip to main content
POST
/
api
/
v2
/
generate
Text-to-Speech
curl --request POST \
  --url https://api.example.com/api/v2/generate \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "text": "<string>",
  "voice_id": "<string>",
  "speed": 123,
  "format": "<string>",
  "audio_url": "<string>",
  "language": "<string>",
  "diarize": true
}
'

Request

Authorization
string
required
Authorization: Bearer nb_YOUR_API_KEY
model
string
required
TTS model slug:
  • minimax-tts — Free, Chinese/English, very natural
  • openai-tts — OpenAI standard voices
  • openai-tts-hd — OpenAI HD quality voices
  • gpt-4o-mini-tts — GPT-4o Mini TTS
  • elevenlabs-flash — ElevenLabs fast (low latency)
  • elevenlabs-v2 — ElevenLabs Multilingual v2 (highest quality)
text
string
required
The text to synthesize. Maximum length depends on model (typically 5,000 characters).
voice_id
string
Voice identifier. Available voices depend on the model. See examples below.
speed
number
default:"1.0"
Speaking speed multiplier. Range: 0.52.0.
format
string
default:"mp3"
Output audio format: mp3, wav, ogg

Available Voices

OpenAI TTS

alloy, echo, fable, onyx, nova, shimmer

ElevenLabs

ElevenLabs supports hundreds of voices. Use common ones like rachel, adam, bella, josh or pass any ElevenLabs voice ID directly.

MiniMax TTS

male-qn-qingse, male-qn-jingying, female-shaonv, female-yujie (and more)

Request Example

curl -X POST https://neuralbox.top/api/v2/generate \
  -H "Authorization: Bearer nb_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "elevenlabs-v2",
    "text": "Welcome to NeuralBox. Your AI-powered creative studio.",
    "voice_id": "rachel",
    "speed": 1.0,
    "format": "mp3"
  }'

Response

{
  "id": "gen_01j9x2tts001",
  "status": "completed",
  "model": "elevenlabs-v2",
  "url": "https://storage.neuralbox.top/generations/gen_01j9x2tts001.mp3",
  "duration_seconds": 3.4,
  "characters": 54,
  "tokens_used": 5,
  "balance_remaining": 284,
  "created_at": "2026-03-08T12:10:00Z"
}

Speech-to-Text

Transcribe audio files to text using Whisper, GPT-4o Transcribe, or ElevenLabs Scribe.

Request

model
string
required
STT model:
  • whisper — Fast, multilingual, free
  • gpt-4o-transcribe — Highest accuracy
  • elevenlabs-scribe — Best for podcasts and meetings (diarization support)
audio_url
string
required
URL to the audio file (MP3, WAV, OGG, M4A, FLAC). Max 25MB.
language
string
ISO 639-1 language code (e.g. en, ru, de). If omitted, the model auto-detects.
diarize
boolean
default:"false"
Speaker diarization (who said what). Only available with elevenlabs-scribe.

Request Example

curl -X POST https://neuralbox.top/api/v2/generate \
  -H "Authorization: Bearer nb_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "whisper",
    "audio_url": "https://example.com/interview.mp3",
    "language": "en"
  }'

Response

{
  "id": "gen_01j9x2stt001",
  "status": "completed",
  "model": "whisper",
  "text": "Hello and welcome to today's episode...",
  "language": "en",
  "duration_seconds": 142.5,
  "tokens_used": 0,
  "created_at": "2026-03-08T12:15:00Z"
}