Skip to main content
NeuralBox supports three types of audio: Text-to-Speech (TTS), Speech-to-Text (STT), and Music Generation, all via the same /api/v2/generate endpoint.

Text-to-Speech

Convert text to natural-sounding speech:
import requests

response = requests.post(
    "https://neuralbox.top/api/v2/generate",
    headers={"Authorization": "Bearer nb_YOUR_API_KEY"},
    json={
        "model": "elevenlabs-v2",
        "text": "Welcome to NeuralBox. Your AI platform for every task.",
        "voice_id": "21m00Tcm4TlvDq8ikWAM"
    }
)

audio_url = response.json()["output_url"]

TTS Model Comparison

SlugNameTierCostBest for
minimax-ttsMiniMax TTSBasic+1 tknChinese/English, high volume
openai-ttsOpenAI TTSBasic+3 tknStandard English voices
openai-tts-hdOpenAI TTS HDBasic+6 tknPodcasts, narration
gpt-4o-mini-ttsGPT-4o Mini TTSBasic+3 tknNatural conversation
elevenlabs-flashEL FlashBasic+18 tknReal-time, low latency
elevenlabs-v2EL ML v2Basic+35 tknMultilingual, highest quality
For real-time applications use elevenlabs-flash. For pre-rendered content (podcasts, audiobooks) use elevenlabs-v2 or openai-tts-hd.

Speech-to-Text

Transcribe audio files:
response = requests.post(
    "https://neuralbox.top/api/v2/generate",
    headers={"Authorization": "Bearer nb_YOUR_API_KEY"},
    json={
        "model": "whisper",
        "audio_url": "https://example.com/audio.mp3",
        "language": "en"
    }
)

print(response.json()["content"])
SlugNameTierCost
whisperWhisper STTStarter2 tkn
gpt-4o-transcribeGPT-4o TranscribeBasic+2 tkn
elevenlabs-scribeEL ScribeBasic+2 tkn

Music Generation

Two music models for different needs:
SlugProviderTierCostBest for
musicgenReplicateStarter9 tknQuick drafts, no subscription needed
elevenlabs-musicElevenLabsBasic+58 tknProfessional quality, longer tracks

MusicGen (Replicate)

response = requests.post(
    "https://neuralbox.top/api/v2/generate",
    headers={"Authorization": "Bearer nb_YOUR_API_KEY"},
    json={
        "model": "musicgen",
        "prompt": "Upbeat jazz with piano and double bass, 120 BPM, swing feel",
        "duration": 30
    }
)

ElevenLabs Music

response = requests.post(
    "https://neuralbox.top/api/v2/generate",
    headers={"Authorization": "Bearer nb_YOUR_API_KEY"},
    json={
        "model": "elevenlabs-music",
        "prompt": "Epic cinematic orchestral score, rising tension, full strings and brass",
        "duration": 60
    }
)

Music Prompt Tips

ElementExamples
Genrelo-fi hip hop, cinematic orchestral, electronic house, acoustic folk
Instrumentspiano, electric guitar, synthesizer, violin, drums
Tempo80 BPM, fast-paced, slow and mellow
Moodenergetic, melancholic, uplifting, tense, relaxing