Openai · Audio model

OpenAI Whisper

Q: Does Coinis Whisper support word-level timestamps and SRT/VTT subtitle output?

Yes. Pass `timestamp_granularities[]=word` or `timestamp_granularities[]=segment` in your request to get precise timestamps. Output format is set with the `response_format` parameter and accepts `srt` or `vtt` for direct subtitle use. Note that word-level timestamps are only available on whisper-1, not on newer GPT-4o transcription models.

Whisper AI is OpenAI's open-source automatic speech recognition model

Generate on Coinis See how it works

One wallet. Every model.

per minute

Example

AI audio

Commercial use included Verified May 26, 2026 Outputs are yours No training on your data

Variants

OpenAI Whisper, every version.

One model. Every resolution. One credit balance.

Whisper STT (transcription)

Audio generation

One wallet. One bill.

$0 / minute

Whisper-1 (OpenAI direct)

Audio generation

One wallet. One bill.

$0.01 / minute

Capabilities

What it does best.

Accurate transcription in 57+ languages where word error rate stays below 50%

Trained on 98 languages total. auto-detects language without pre-configuration

Returns transcripts as `json`, `text`, `srt`, `vtt`, or `verbose_json`

SRT and VTT output ready for direct upload to video platforms

Word-level and segment-level timestamps via the `timestamp_granularities[]` parameter

Enables frame-precise video editing tied to individual spoken words

Translates audio in any supported language to English text via a single `/v1/audio/translations` call

No intermediate transcription step required

Accepts mp3, mp4, mpeg, mpga, m4a, wav, and webm files

File uploads capped at 25 MB per request. Longer audio must be chunked

Encoder-decoder Transformer. Input audio split into 30-second chunks and converted to log-Mel spectrograms before encoding

Decoder predicts text with special tokens for language ID, timestamps, and translation mode

Optional prompt parameter steers transcription style, terminology, and preferred spellings

Model reads only the final 224 tokens of any prompt

Released under the MIT License. Model weights and inference code are publicly available

How it works

Generate OpenAI Whisper in three steps.

No code. No API. Describe what you want. Get ad-ready output.

Describe your ad

Paste a product link or a short brief. Coinis pulls your brand.

Pick OpenAI Whisper

Choose the model for the look you want. Switch anytime.

Get creatives

Generate. Launch to Meta. Track results in one place.

One wallet. Every model. No accounts to juggle.

Pricing

Credit pricing. No surprises.

One wallet across every model. No API accounts to juggle.

OpenAI Whisper · Whisper STT (transcription)

0 credits

per minute · $0

Speech to text

$0 / minute

One wallet. Every model. One invoice. 1 credit = $0.01

1 minute ≈ 0 credits ($0)

Start free. 150 credits a week.

No credit card.

Why pay through Coinis

One wallet for every model. No API keys. No separate bills.
Generate ads. Launch to Meta. Track in one place.
On-brand output from your Brand Profile.

1 credit = $0.01 pay-as-you-go. Less on a plan.

Standard vs Fast

Pick the run for the job.

Whisper STT (transcription)

Final renders, studios

Resolution —

Price $0 / minute

Whisper-1 (OpenAI direct)

Rapid tests, high volume

Resolution —

Price $0.01 / minute

Use cases

Two buyers. One model.

For teams

Every model. One wallet. One bill.

Video, image, audio, and LLM in one place.

Variants in bulk.

Same brief. Many cuts. No accounts to juggle.

Own every output.

Download it. Run it in any campaign.

For creatives

Ship a Reel before lunch.

Prompt to platform-native clip in minutes.

Same product. Ten formats.

One generation, every aspect ratio.

Commercial UGC without a creator.

Authentic selfie-style ads, on brand.

Renders in seconds. Set a seed. Get the same frame back.

Outputs are yours. Sell them.

Safe for paid ads.

Your prompts are never used for training.

FAQ

Questions about OpenAI Whisper. Answered.

Can't find what you're looking for?

Talk to us ›

On Coinis, OpenAI Whisper is pay-as-you-go from one shared token wallet. Buy tokens once. Spend them on any model. No separate accounts. No monthly commit.

Yes. Pass timestamp_granularities[]=word or timestamp_granularities[]=segment in your request to get precise timestamps. Output format is set with the response_format parameter and accepts srt or vtt for direct subtitle use. Note that word-level timestamps are only available on whisper-1, not on newer GPT-4o transcription models.

Whisper accepts mp3, mp4, mpeg, mpga, m4a, wav, and webm files up to 25 MB per request. For longer audio, split the file into chunks before uploading. Cut at natural pauses rather than mid-sentence to avoid losing context across chunk boundaries.

Whisper transcribes accurately in 57+ languages where word error rate falls below 50%. It was trained on 98 languages in total. You do not need to specify the language in advance. Whisper identifies it automatically from the audio signal.

On Coinis, OpenAI Whisper is pay-as-you-go from one shared token wallet. Buy tokens once. Spend them on any model. No separate accounts. No monthly commit.

Upscale this Edit a frame Add a voiceover

Start free

Your wallet. Every model. One place.

Start free. 150 credits a week. No card.

Generate on Coinis

No credit card.

Pricing and capabilities verified 2026-05-26. Read the docs .

OpenAI Whisper, every version.

Whisper STT (transcription)

Whisper-1 (OpenAI direct)

Generate OpenAI Whisper in three steps.

Describe your ad

Pick OpenAI Whisper

Get creatives

Credit pricing. No surprises.

Pick the run for the job.

Whisper STT (transcription)

Whisper-1 (OpenAI direct)

Two buyers. One model.

Every model. One wallet. One bill.

Variants in bulk.

Own every output.

Ship a Reel before lunch.

Same product. Ten formats.

Commercial UGC without a creator.

Questions about OpenAI Whisper. Answered.

Pair it with.

Deepgram Nova-3 / Aura-2

ElevenLabs (Multilingual / Turbo / Music)

Sync Labs Lipsync 1.9 / 2

All audio models

Your wallet. Every model. One place.