Deepseek · Llm model

DeepSeek V4 Flash

DeepSeek V4 Flash is a 284-billion-parameter Mixture-of-Experts LLM that activat

One API key. Every model.

per 1m tokens
Example
AI text
Commercial use included Verified May 26, 2026 Outputs are yours No training on your data
Endpoints

Start building with DeepSeek V4 Flash.

One model. Three ways to call it. Same key, same bill.

IN

Llm

One call. Same key. Same bill.

$0.24 / 1m tokens

OUT

Llm

One call. Same key. Same bill.

$0.48 / 1m tokens

Capabilities

What it does best.

DeepSeek V4 Flash capabilities

Context and output

  • Supports a 1,000,000-token context window[^1]
  • Generates up to 384,000 output tokens per response[^1]

Modes and reasoning

  • Thinking mode outputs a chain-of-thought via the reasoning_content field before the final answer, improving accuracy[^2]
  • Non-thinking mode is available for latency-sensitive workloads[^2]
  • Reasoning effort is configurable via reasoning_effort with levels high and max[^2]

Structured output and completions

  • Supports JSON Output and Tool Calls in both thinking and non-thinking modes[^1]
  • FIM Completion (Beta) is available in non-thinking mode only[^1]
  • Chat Prefix Completion (Beta) supported[^1]

API compatibility

  • Compatible with the OpenAI API format at https://api.deepseek.com[^1]
  • Compatible with the Anthropic API format at https://api.deepseek.com/anthropic[^1]

Throughput and pricing

  • Supports up to 2,500 simultaneous concurrent requests[^1]
  • Input tokens cost $0.238 per 1M (deepseek-v4-flash-in) on Coinis[^1]
  • Output tokens cost $0.476 per 1M (deepseek-v4-flash-out) on Coinis[^1]
  • Cache-hit input pricing is $0.0028 per 1M tokens. That is 1/50th of the standard $0.14 wholesale input rate, following the April 2026 price cut[^1]
API

Call DeepSeek V4 Flash in three lines.

One key. One base URL. Same SDK shape you already use.

# 1. set your key
export COINIS_API_KEY="sk_live_..."

# 2. call the model
curl https://api.app.coinis.com/v1/llm/generate \
  -H "Authorization: Bearer $COINIS_API_KEY" \
  -d '{"prompt":"neon city, rain, tracking shot"}'
import { Coinis } from "@coinis/sdk";
const coinis = new Coinis(process.env.COINIS_API_KEY);

const job = await coinis.llm.generate({
  model: "models/deepseek/deepseek",
  prompt: "neon city, rain, tracking shot",
});
from coinis import Coinis
coinis = Coinis(os.environ["COINIS_API_KEY"])

job = coinis.llm.generate(
    model="models/deepseek/deepseek",
    prompt="neon city, rain, tracking shot",
)
Response
{
  "id": "gen_8fa2c1",
  "status": "succeeded",
  "model": "models/deepseek/deepseek",
  "output": {
    "image_url": 
                "https://cdn.coinis.com/gen_8fa2c1.mp4"
              
              ,
    "format": "mp4"
  },
  "tokens_used": 10
}

Already on another provider's SDK? Change the host. Keep the call.

Pricing

Token pricing. No surprises.

One wallet across every model. No API accounts to juggle.

DeepSeek V4 Flash · IN
2.4 tokens
per 1m tokens · $0.24
Frontier LLM
$0.24 / 1m tokens
One key. Every model. One invoice. 1 token = $0.10
1 1m tokens ≈ 2 tokens ($0.24)
Start free. 15 tokens a week.

No credit card.

Why pay through Coinis
  • One wallet for every model. No API keys. No separate bills.
  • Generate ads. Launch to Meta. Track in one place.
  • On-brand output from your Brand Profile.

1 token = $0.10 pay-as-you-go. Less on a plan.

Standard vs Fast

Pick the run for the job.

IN

Final renders, studios
Resolution
Price $0.24 / 1m tokens

OUT

Rapid tests, high volume
Resolution
Price $0.48 / 1m tokens
Use cases

Two buyers. One model.

For builders

Resell every model. One key. One bill.

Unified API across video, image, audio, and LLM.

Generate 500 variants overnight.

Async queue plus webhooks. Batch at scale.

White-label the output.

Ship it under your brand. Outputs are yours.

For creatives

Ship a Reel before lunch.

Prompt to platform-native clip in minutes.

Same product. Ten formats.

One generation, every aspect ratio.

Commercial UGC without a creator.

Authentic selfie-style ads, on brand.

Production coding assistants DeepSeek V4 Flash is optimized for fast, high-throughput code generation and completion, making it a strong fit for IDE plugins, code review bots, and developer copilots.[^1]

High-volume chat and conversational AI Its low per-token cost and 2,500-request concurrency cap make it practical for production chat backends that need to handle thousands of simultaneous users without runaway token spend.[^1]

Long-context document and codebase analysis The 1M-token context window lets you drop entire codebases, legal documents, or research corpora into a single prompt and synthesize answers without chunking.[^1]

Multi-step agentic workflows Thinking mode combined with Tool Call support enables complex, multi-turn agent loops. The model reasons through subtasks before executing tool calls, reducing error rates on compound instructions.[^2]

RAG pipelines with repeated context Applications that re-send large system prompts or knowledge-base chunks on every call benefit the most from cache-hit pricing. At $0.0028 per 1M cached input tokens on the wholesale rate, input cost on repeated context is a fraction of the standard rate.[^1]

Renders in seconds. Set a seed. Get the same frame back.

Outputs are yours. Sell them.

Safe for paid ads.

Your prompts are never used for training.

FAQ

DeepSeek V4 Flash FAQs

How much does the DeepSeek V4 Flash API cost per 1M tokens on Coinis?

On Coinis, deepseek-v4-flash-in is priced at $0.238 per 1M input tokens and deepseek-v4-flash-out at $0.476 per 1M output tokens. Cache-hit input tokens are billed at a significantly lower rate for repeated context. Check the official DeepSeek docs at https://api-docs.deepseek.com/quick_start/pricing for the wholesale cache-hit rate.

What is the difference between DeepSeek V4 Flash and DeepSeek V4 Pro, and which should I use?

V4 Flash is the efficiency-optimized tier. It activates 13B parameters per forward pass, supports 2,500 concurrent requests, and carries a lower per-token cost. V4 Pro targets workloads that need maximum output quality and accepts fewer concurrent requests. Start with V4 Flash for cost-sensitive or high-volume production tasks. Switch to V4 Pro when quality benchmarks on your specific task demand it.

How does DeepSeek compare to GPT-4-class and Claude models on price?

DeepSeek V4 Flash on Coinis is priced at $0.238 per 1M input tokens and $0.476 per 1M output tokens. For a direct comparison against specific GPT or Claude variants, visit the Coinis LLM category page where per-model pricing is listed side by side. Pricing for closed-model vendors changes frequently, so live comparisons are more reliable than static figures.

What is DeepSeek's thinking mode and when should I enable it?

Thinking mode makes the model output a full chain-of-thought via the reasoning_content field before delivering its final answer. Enable it for tasks that benefit from step-by-step reasoning: math, code debugging, multi-step planning, and agentic tool-call sequences. Disable it for simple retrieval or classification tasks where latency matters more than deliberation depth. Note that temperature, top_p, presence_penalty, and frequency_penalty parameters have no effect in thinking mode.

How does DeepSeek's cache-hit pricing work, and how much can I actually save?

When the same input tokens appear in the DeepSeek prompt cache, they are billed at the cache-hit rate of $0.0028 per 1M tokens (wholesale), versus the standard input rate of $0.14 per 1M tokens. That is 1/50th of the standard input rate. RAG pipelines and agent loops that re-send a large system prompt or knowledge base on every call see the largest absolute savings, because the expensive repeated tokens are charged at the cache-hit rate instead.

Can I use DeepSeek with the OpenAI SDK or Anthropic SDK without code changes?

DeepSeek V4 Flash is compatible with both the OpenAI API format (base URL: https://api.deepseek.com) and the Anthropic API format (base URL: https://api.deepseek.com/anthropic). If you are already using an OpenAI or Anthropic SDK, you can point the base URL at Coinis's endpoint and send requests without rewriting your client code.

What happens to the `deepseek-chat` and `deepseek-reasoner` model names. are they being deprecated?

Yes. The aliases deepseek-chat (non-thinking mode) and deepseek-reasoner (thinking mode) currently map to DeepSeek V4 Flash, but DeepSeek has confirmed they will be deprecated in a future release. Migrate to the explicit deepseek-v4-flash identifier with the mode toggle to avoid breakage when those aliases are retired.

What are the rate limits and concurrency caps for DeepSeek V4 Flash?

DeepSeek V4 Flash supports up to 2,500 simultaneous concurrent requests, compared to 500 for V4 Pro. Additional rate limit policies apply and are documented at https://api-docs.deepseek.com/quick_start/rate_limit. For production workloads approaching those limits, contact Coinis support to discuss dedicated capacity options.

Start free

Your wallet. Every model. One call away.

Start free. 15 tokens a week. No card.

Generate on Coinis

No credit card.

Pricing and capabilities verified 2026-05-26. Read the docs .