Openai · Audio model

OpenAI Whisper

Whisper AI is OpenAI's open-source automatic speech recognition model

One API key. Every model.

per minute
Example
AI audio
Commercial use included Verified May 26, 2026 Outputs are yours No training on your data
Endpoints

Start building with OpenAI Whisper.

One model. Three ways to call it. Same key, same bill.

Whisper STT (transcription)

Audio generation

One call. Same key. Same bill.

$0 / minute

Whisper-1 (OpenAI direct)

Audio generation

One call. Same key. Same bill.

$0.01 / minute

Capabilities

What it does best.

Whisper AI capabilities

Languages

  • Accurate transcription in 57+ languages where word error rate stays below 50%[^3]
  • Trained on 98 languages total. auto-detects language without pre-configuration[^3]

Output formats

  • Returns transcripts as json, text, srt, vtt, or verbose_json[^4]
  • SRT and VTT output ready for direct upload to video platforms[^4]

Timestamps

  • Word-level and segment-level timestamps via the timestamp_granularities[] parameter[^5]
  • Enables frame-precise video editing tied to individual spoken words[^5]

Translation

  • Translates audio in any supported language to English text via a single /v1/audio/translations call[^6]
  • No intermediate transcription step required[^6]

Input

  • Accepts mp3, mp4, mpeg, mpga, m4a, wav, and webm files[^7]
  • File uploads capped at 25 MB per request. longer audio must be chunked[^7]

Architecture

  • Encoder-decoder Transformer. input audio split into 30-second chunks and converted to log-Mel spectrograms before encoding[^8]
  • Decoder predicts text with special tokens for language ID, timestamps, and translation mode[^8]

Prompting

  • Optional prompt parameter steers transcription style, terminology, and preferred spellings[^9]
  • Model reads only the final 224 tokens of any prompt[^9]

Licensing

  • Released under the MIT License. model weights and inference code are publicly available[^10]
API

Call OpenAI Whisper in three lines.

One key. One base URL. Same SDK shape you already use.

# 1. set your key
export COINIS_API_KEY="sk_live_..."

# 2. call the model
curl https://api.app.coinis.com/v1/audio/generate \
  -H "Authorization: Bearer $COINIS_API_KEY" \
  -d '{"prompt":"neon city, rain, tracking shot"}'
import { Coinis } from "@coinis/sdk";
const coinis = new Coinis(process.env.COINIS_API_KEY);

const job = await coinis.audio.generate({
  model: "models/openai/whisper",
  prompt: "neon city, rain, tracking shot",
});
from coinis import Coinis
coinis = Coinis(os.environ["COINIS_API_KEY"])

job = coinis.audio.generate(
    model="models/openai/whisper",
    prompt="neon city, rain, tracking shot",
)
Response
{
  "id": "gen_8fa2c1",
  "status": "succeeded",
  "model": "models/openai/whisper",
  "output": {
    "image_url": 
                "https://cdn.coinis.com/gen_8fa2c1.mp4"
              
              ,
    "format": "mp4"
  },
  "tokens_used": 10
}

Already on another provider's SDK? Change the host. Keep the call.

Pricing

Token pricing. No surprises.

One wallet across every model. No API accounts to juggle.

OpenAI Whisper · Whisper STT (transcription)
0 tokens
per minute · $0
Speech to text
$0 / minute
One key. Every model. One invoice. 1 token = $0.10
1 minute ≈ 0 tokens ($0)
Start free. 15 tokens a week.

No credit card.

Why pay through Coinis
  • One wallet for every model. No API keys. No separate bills.
  • Generate ads. Launch to Meta. Track in one place.
  • On-brand output from your Brand Profile.

1 token = $0.10 pay-as-you-go. Less on a plan.

Standard vs Fast

Pick the run for the job.

Whisper STT (transcription)

Final renders, studios
Resolution
Price $0 / minute

Whisper-1 (OpenAI direct)

Rapid tests, high volume
Resolution
Price $0.01 / minute
Use cases

Two buyers. One model.

For builders

Resell every model. One key. One bill.

Unified API across video, image, audio, and LLM.

Generate 500 variants overnight.

Async queue plus webhooks. Batch at scale.

White-label the output.

Ship it under your brand. Outputs are yours.

For creatives

Ship a Reel before lunch.

Prompt to platform-native clip in minutes.

Same product. Ten formats.

One generation, every aspect ratio.

Commercial UGC without a creator.

Authentic selfie-style ads, on brand.

Podcast and interview transcription Upload a recording and get a clean transcript or ready-to-publish SRT/VTT subtitle file. Works for long-form audio without any manual editing of speaker notes or timestamps.[^2]

Cross-lingual audio translation Send a non-English recording and receive English text in one API call. No separate translation step, no extra cost, no pipeline stitching.[^6]

Video editing with word-level timestamps Pull precise timestamps for every spoken word. Use them to cut silences, remove filler words, or sync captions to the exact frame in your editing tool.[^5]

Multilingual content pipelines Feed audio in any of 57+ supported languages and let Whisper detect the language automatically. Scale localization workflows without building per-language routing logic.[^3]

Voice interface and search indexing Turn call recordings, support tickets, or meeting audio into searchable text. Whisper's broad training data makes it reliable across accents, noise conditions, and technical vocabulary.[^1]

Renders in seconds. Set a seed. Get the same frame back.

Outputs are yours. Sell them.

Safe for paid ads.

Your prompts are never used for training.

FAQ

OpenAI Whisper FAQs

How much does Whisper AI cost per minute on Coinis vs OpenAI direct?

On Coinis, the Cloudflare-routed variant (whisper-stt) is priced at $0.00085/min. The OpenAI direct variant (whisper-1-openai-direct) is $0.01/min. OpenAI charges $0.006/min on their own platform. The Coinis-routed path is the cheapest publicly available option for the same underlying model.

Does Coinis Whisper support word-level timestamps and SRT/VTT subtitle output?

Yes. Pass timestamp_granularities[]=word or timestamp_granularities[]=segment in your request to get precise timestamps. Output format is set with the response_format parameter and accepts srt or vtt for direct subtitle use. Note that word-level timestamps are only available on whisper-1, not on newer GPT-4o transcription models.

What audio file formats and size limits does Whisper support, and how do I transcribe files larger than 25 MB?

Whisper accepts mp3, mp4, mpeg, mpga, m4a, wav, and webm files up to 25 MB per request. For longer audio, split the file into chunks before uploading. Cut at natural pauses rather than mid-sentence to avoid losing context across chunk boundaries.

Which languages does Whisper transcribe accurately, and can it auto-detect language?

Whisper transcribes accurately in 57+ languages where word error rate falls below 50%. It was trained on 98 languages in total. You do not need to specify the language in advance. Whisper identifies it automatically from the audio signal.

What is the difference between Whisper-1 and the newer GPT-4o transcription models?

Whisper-1 is the original open-source ASR model and is the only model that supports the /v1/audio/translations endpoint and the timestamp_granularities[] parameter for word-level timestamps. GPT-4o-based transcription models offer improved accuracy on some benchmarks but do not yet support those two features. If you need word timestamps or direct audio-to-English translation, whisper-1 is the correct choice.

Can Whisper translate non-English audio directly to English in one API call?

Yes. Send your audio to the /v1/audio/translations endpoint and Whisper returns English text regardless of the source language. No separate translation API call is needed. The output is always English only. Translation into other target languages is not supported by this endpoint.

Is there a Whisper API on Coinis, and how does it work?

Yes. POST to https://api.app.coinis.com/v1/audio/generate with model=whisper to access both available variants. The whisper-stt variant routes through Cloudflare Workers AI at $0.00085/min. The whisper-1-openai-direct variant routes to OpenAI at $0.01/min. Full request schema and parameter reference are in the API docs linked on this page.

Start free

Your wallet. Every model. One call away.

Start free. 15 tokens a week. No card.

Generate on Coinis

No credit card.

Pricing and capabilities verified 2026-05-26. Read the docs .