Elevenlabs · Audio model

ElevenLabs (Multilingual / Turbo / Music).
Multilingual TTS, real-time voice, and licensed AI music.

One API.

One API key. Every model.

per 1k chars
Example
AI audio
Commercial use included Verified May 26, 2026 Outputs are yours No training on your data
Endpoints

Start building with ElevenLabs (Multilingual / Turbo / Music).

One model. Three ways to call it. Same key, same bill.

ElevenLabs Multilingual v2 / v3 (scale-tier projected)

Audio generation

One call. Same key. Same bill.

$0.31 / 1k chars

ElevenLabs Turbo v2.5

Audio generation

One call. Same key. Same bill.

$0.09 / 1k chars

ElevenLabs Music (v2v)

Audio generation

One call. Same key. Same bill.

$0.14 / track

ElevenLabs Multilingual v2 (per-character, retail tier)

Audio generation

One call. Same key. Same bill.

$0.38 / 1k chars

Capabilities

What it does best.

ElevenLabs capabilities

Languages

  • Multilingual v2 supports 29 languages including English, Japanese, Chinese, German, Hindi, French, Korean, Portuguese, Italian, and Spanish, with a 10,000 character limit per request.[^1]
  • Eleven v3 (flagship) supports 70+ languages with a 5,000 character limit and a Text to Dialogue API for natural multi-speaker conversation.[^2]
  • Flash v2.5 (the current Turbo v2.5 equivalent) supports 32 languages with a 40,000 character limit per request.[^3]

Latency

  • Flash v2.5 delivers ~75ms first-audio latency (excluding app and network overhead), making it the right choice for real-time agents and chatbots.[^3]
  • Multilingual v2 and Eleven v3 run at standard latency, suited for narration and async generation.[^2]

Character limits per request

  • Multilingual v2: 10,000 characters (~10 minutes of audio).[^1]
  • Eleven v3: 5,000 characters (~5 minutes).[^2]
  • Flash v2.5: 40,000 characters (~40 minutes).[^3]

Audio output formats

  • MP3 at 22,050 Hz / 32 kbps up to 44,100 Hz / 192 kbps (192 kbps requires Creator tier or above on ElevenLabs direct).[^4]
  • PCM and WAV at up to 44.1 kHz (Pro tier or above on ElevenLabs direct).[^4]
  • μ-law output supported for Twilio telephony integrations.[^4]

Voice control

  • Per-request overrides for stability, similarity_boost, style, and use_speaker_boost.[^5]
  • Pronunciation dictionaries (up to 3 per request) for custom lexicons.[^5]
  • Seed parameter for deterministic, repeatable output.[^5]
  • Previous and next text context for audio continuity across multi-part generations.[^5]

Music (Music v2v)

  • Prompt-driven control over genre, mood, instruments, and song structure.[^6]
  • Mid-track genre transitions (e.g. opera to heavy metal within a single track).[^6]
  • Section inpainting: regenerate a specific segment without altering the rest of the track.[^6]
  • Multilingual vocal generation across English, Spanish, French, German, Japanese, and more.[^7]
  • Tracks trained on licensed data and cleared for commercial use on paid plans.[^7]
  • Download in MP3 and WAV. Genres include trap, pop, rock, jazz, ambient, afrobeats, indie rock, reggaeton, R&B, cinematic, lofi, phonk, and more.[^8]
API

Call ElevenLabs (Multilingual / Turbo / Music) in three lines.

One key. One base URL. Same SDK shape you already use.

# 1. set your key
export COINIS_API_KEY="sk_live_..."

# 2. call the model
curl https://api.app.coinis.com/v1/audio/generate \
  -H "Authorization: Bearer $COINIS_API_KEY" \
  -d '{"prompt":"neon city, rain, tracking shot"}'
import { Coinis } from "@coinis/sdk";
const coinis = new Coinis(process.env.COINIS_API_KEY);

const job = await coinis.audio.generate({
  model: "models/elevenlabs/elevenlabs",
  prompt: "neon city, rain, tracking shot",
});
from coinis import Coinis
coinis = Coinis(os.environ["COINIS_API_KEY"])

job = coinis.audio.generate(
    model="models/elevenlabs/elevenlabs",
    prompt="neon city, rain, tracking shot",
)
Response
{
  "id": "gen_8fa2c1",
  "status": "succeeded",
  "model": "models/elevenlabs/elevenlabs",
  "output": {
    "image_url": 
                "https://cdn.coinis.com/gen_8fa2c1.mp4"
              
              ,
    "format": "mp4"
  },
  "tokens_used": 10
}

Already on another provider's SDK? Change the host. Keep the call.

Pricing

Token pricing. No surprises.

One wallet across every model. No API accounts to juggle.

ElevenLabs (Multilingual / Turbo / Music) · ElevenLabs Multilingual v2 / v3 (scale-tier projected)
3.1 tokens
per 1k chars · $0.31
Voice + music
$0.31 / 1k chars
One key. Every model. One invoice. 1 token = $0.10
1 1k chars ≈ 3 tokens ($0.31)
Budget variant: ElevenLabs Turbo v2.5 · $0.09 / 1k chars
Start free. 25 tokens a week.

No credit card.

Why pay through Coinis
  • One wallet for every model. No API keys. No separate bills.
  • Generate ads. Launch to Meta. Track in one place.
  • On-brand output from your Brand Profile.

1 token = $0.10 pay-as-you-go. Less on a plan.

Standard vs Fast

Pick the run for the job.

ElevenLabs Multilingual v2 / v3 (scale-tier projected)

Final renders, studios
Resolution
Price $0.31 / 1k chars

ElevenLabs Turbo v2.5

Rapid tests, high volume
Resolution
Price $0.09 / 1k chars
Use cases

Two buyers. One model.

For builders

Resell every model. One key. One bill.

Unified API across video, image, audio, and LLM.

Generate 500 variants overnight.

Async queue plus webhooks. Batch at scale.

White-label the output.

Ship it under your brand. Outputs are yours.

For creatives

Ship a Reel before lunch.

Prompt to platform-native clip in minutes.

Same product. Ten formats.

One generation, every aspect ratio.

Commercial UGC without a creator.

Authentic selfie-style ads, on brand.

Long-form narration and audiobooks Multilingual v2 handles up to 10,000 characters per request with consistent, nuanced voice delivery. Publish chapters across 29 languages without stitching multiple API calls.[^1]

Real-time voice agents and chatbots Flash v2.5 (Turbo v2.5) returns first audio in ~75ms across 32 languages. Deploy voice bots, AI phone agents, and live customer service tools without noticeable delay.[^3]

Multilingual UGC and product-video voiceover Generate voiceover in 29 to 70+ languages with voice_settings controls for stability and style. Ship localized ad creative without re-recording in each market.[^2]

Licensed background music for ads, social, and branded video Music v2v generates commercially cleared tracks from a text prompt. Control genre, mood, and structure. Use section inpainting to swap one segment mid-track without rebuilding the whole piece.[^6][^7]

Game character dialogue and soundtracks Multilingual v2 or Eleven v3 produces character voices with stability and style controls for consistent in-game audio. Pair with Music v2v for immersive background soundtracks across genres.[^6][^2]

Renders in seconds. Set a seed. Get the same frame back.

Outputs are yours. Sell them.

Safe for paid ads.

Your prompts are never used for training.

FAQ

ElevenLabs (Multilingual / Turbo / Music) FAQs

How much does ElevenLabs cost on Coinis vs. going direct?

On Coinis, ElevenLabs Turbo v2.5 runs at $0.085 per 1,000 characters. Multilingual v2 at the scale-tier projected rate is $0.306 per 1,000 characters, or $0.381 per 1,000 characters at the per-character retail tier. Music v2v is $0.136 per track. Coinis bills across TTS and Music in one place with no subscription tier or credit rollover risk. See the full breakdown at /models/elevenlabs/pricing.

Is there an ElevenLabs API I can call through Coinis, and what is the endpoint?

Yes. Send a POST request to https://api.app.coinis.com/v1/audio/generate. Pass the variant ID, your text or music prompt, and voice or music parameters in the request body. Full auth details, parameter schema, and code samples are at /models/elevenlabs/api.

What is the difference between ElevenLabs Multilingual v2, Turbo v2.5, and Music?

Multilingual v2 targets long-form narration: 29 languages, 10,000 character limit per request, standard latency. Turbo v2.5 (now Flash v2.5) targets real-time applications: 32 languages, 40,000 character limit, ~75ms first-audio latency. Music v2v is a separate model for prompt-driven track generation with genre, mood, and structure control. Each has a distinct variant ID and per-unit price on Coinis.

Which ElevenLabs model should I use for real-time voice agents vs. long-form narration?

Use Flash v2.5 (billed as elevenlabs-turbo-v2-5 on Coinis) for voice agents, chatbots, and any use case where response latency matters. It returns first audio in ~75ms. Use Multilingual v2 or Eleven v3 for audiobooks, narration, and voiceover where audio quality and language range matter more than speed.

Does ElevenLabs on Coinis support voice cloning and multilingual output?

Multilingual output is fully supported across all TTS variants. Voice cloning availability depends on the ElevenLabs plan tier associated with your Coinis account. Professional Voice Cloning requires Creator tier or above on ElevenLabs direct. Contact Coinis support to confirm cloning eligibility for your account.

Can I use ElevenLabs-generated voice and music commercially?

TTS output generated on paid Coinis plans carries a commercial license for standard advertising, social, and content use. Music tracks generated via Music v2v are trained on licensed data and cleared for commercial use on paid plans. Full film, TV, and Studio Games rights require an Enterprise arrangement. Check the official docs at https://elevenlabs.io/music for the exact scope.

What languages and output formats are supported?

Multilingual v2 supports 29 languages. Eleven v3 supports 70+ languages. Flash v2.5 supports 32 languages. Output formats include MP3 (up to 192 kbps), PCM, WAV (up to 44.1 kHz), and μ-law for Twilio. High-resolution PCM and WAV require Pro tier or above on ElevenLabs direct. 192 kbps MP3 requires Creator tier or above.

Start free

Your wallet. Every model. One call away.

Start free. 25 tokens a week. No card.

Generate on Coinis

No credit card.

Pricing and capabilities verified 2026-05-26. Read the docs .