Whisper STT (transcription)
Audio generationOne call. Same key. Same bill.
Whisper AI is OpenAI's open-source automatic speech recognition model
One API key. Every model.
One model. Three ways to call it. Same key, same bill.
One call. Same key. Same bill.
One call. Same key. Same bill.
Capabilities
Languages
Output formats
json, text, srt, vtt, or verbose_json[^4]Timestamps
timestamp_granularities[] parameter[^5]Translation
/v1/audio/translations call[^6]Input
Architecture
Prompting
Licensing
One key. One base URL. Same SDK shape you already use.
# 1. set your key
export COINIS_API_KEY="sk_live_..."
# 2. call the model
curl https://api.app.coinis.com/v1/audio/generate \
-H "Authorization: Bearer $COINIS_API_KEY" \
-d '{"prompt":"neon city, rain, tracking shot"}' import { Coinis } from "@coinis/sdk";
const coinis = new Coinis(process.env.COINIS_API_KEY);
const job = await coinis.audio.generate({
model: "models/openai/whisper",
prompt: "neon city, rain, tracking shot",
}); from coinis import Coinis
coinis = Coinis(os.environ["COINIS_API_KEY"])
job = coinis.audio.generate(
model="models/openai/whisper",
prompt="neon city, rain, tracking shot",
) {
"id": "gen_8fa2c1",
"status": "succeeded",
"model": "models/openai/whisper",
"output": {
"image_url":
"https://cdn.coinis.com/gen_8fa2c1.mp4"
,
"format": "mp4"
},
"tokens_used": 10
} Already on another provider's SDK? Change the host. Keep the call.
One wallet across every model. No API accounts to juggle.
No credit card.
1 token = $0.10 pay-as-you-go. Less on a plan.
Unified API across video, image, audio, and LLM.
Async queue plus webhooks. Batch at scale.
Ship it under your brand. Outputs are yours.
Prompt to platform-native clip in minutes.
One generation, every aspect ratio.
Authentic selfie-style ads, on brand.
Podcast and interview transcription Upload a recording and get a clean transcript or ready-to-publish SRT/VTT subtitle file. Works for long-form audio without any manual editing of speaker notes or timestamps.[^2]
Cross-lingual audio translation Send a non-English recording and receive English text in one API call. No separate translation step, no extra cost, no pipeline stitching.[^6]
Video editing with word-level timestamps Pull precise timestamps for every spoken word. Use them to cut silences, remove filler words, or sync captions to the exact frame in your editing tool.[^5]
Multilingual content pipelines Feed audio in any of 57+ supported languages and let Whisper detect the language automatically. Scale localization workflows without building per-language routing logic.[^3]
Voice interface and search indexing Turn call recordings, support tickets, or meeting audio into searchable text. Whisper's broad training data makes it reliable across accents, noise conditions, and technical vocabulary.[^1]
Renders in seconds. Set a seed. Get the same frame back.
Outputs are yours. Sell them.
Safe for paid ads.
Your prompts are never used for training.
Start free
Start free. 15 tokens a week. No card.
Generate on CoinisNo credit card.
Pricing and capabilities verified 2026-05-26. Read the docs .