STT uses a dedicated multipart endpoint POST /api/v2/stt, not the standard /generate endpoint.
Request
curl -X POST https://neuralbox.top/api/v2/stt \
-H "Authorization: Bearer nb_YOUR_API_KEY" \
-F "audio=@recording.mp3" \
-F "model=whisper" \
-F "language=en"
Authorization: Bearer nb_YOUR_API_KEY
Audio file. Supported formats: mp3, mp4, wav, m4a, ogg, flac, webm. Max size: 25 MB.
STT model slug. See table below.
Language code (e.g. en, ru, es). Optional — auto-detected if omitted.
Speaker diarization. Only supported by elevenlabs-scribe.
Models
| Slug | Provider | Tier | Cost | Notes |
|---|
whisper | OpenAI | Starter | 2 tkn | Fast, 99 languages |
gpt-4o-transcribe | OpenAI | Basic+ | 2 tkn | Highest accuracy |
elevenlabs-scribe | ElevenLabs | Basic+ | 2 tkn | Best for meetings, supports diarization |
Response
{
"id": 18510,
"status": "completed",
"model_slug": "whisper",
"result_text": "Hello and welcome to today's episode...",
"tokens_spent": 2,
"processing_ms": 3420
}
Diarization (who said what)
Available only with elevenlabs-scribe:
curl -X POST https://neuralbox.top/api/v2/stt \
-H "Authorization: Bearer nb_YOUR_API_KEY" \
-F "audio=@meeting.mp3" \
-F "model=elevenlabs-scribe" \
-F "diarize=true"
Response includes speaker labels in result_text:
[Speaker 1]: Hello, let's start the meeting.
[Speaker 2]: Sure, I have three points to discuss.
Code Examples
import requests
with open("audio.mp3", "rb") as f:
response = requests.post(
"https://neuralbox.top/api/v2/stt",
headers={"Authorization": "Bearer nb_YOUR_API_KEY"},
files={"audio": f},
data={"model": "whisper", "language": "en"}
)
print(response.json()["result_text"])