Alibaba · Llm model

Qwen3 Max

Qwen3 Max is Alibaba's top-tier proprietary LLM in the Qwen3 family

One API key. Every model.

per 1m tokens
Example
AI text
Commercial use included Verified May 26, 2026 Outputs are yours No training on your data
Endpoints

Start building with Qwen3 Max.

One model. Three ways to call it. Same key, same bill.

IN

Llm

One call. Same key. Same bill.

$2.04 / 1m tokens

OUT

Llm

One call. Same key. Same bill.

$10.2 / 1m tokens

Capabilities

What it does best.

Qwen capabilities

Context and output

  • 262,144-token total context window[^5]
  • 32,768-token maximum output per request[^5]

Reasoning

  • Higher accuracy on math, logic, and science tasks versus prior Qwen3 versions[^2]
  • Reduced hallucinations in open-ended Q&A and writing[^8]

Coding

  • Used in production by Roo Code, a VS Code multi-agent extension[^7]
  • Scores competitively in code categories on Design Arena benchmarks[^7]

Multilingual

  • Supports 100+ languages with stronger translation and commonsense reasoning[^3]
  • Follows complex instructions in Chinese and English more reliably than earlier Qwen3 releases[^2]

Agentic and tool calling

  • OpenAI-style tools and tool_choice parameters supported[^6]
  • Streaming via OpenAI-compatible server-sent events API[^10]
  • Also supports response_format, temperature, top_p, seed, and presence_penalty[^6]
  • Average tool-call error rate is 6.39% on Alibaba Cloud Int. build error-handling logic into production agents[^9]

RAG

  • Explicitly optimized for retrieval-augmented generation over long documents[^4]

Throughput

  • Average ~24 tokens/second via Alibaba Cloud Int.[^1]
  • Average first-token latency ~1.03 seconds, end-to-end latency ~5.02 seconds[^1]
API

Call Qwen3 Max in three lines.

One key. One base URL. Same SDK shape you already use.

# 1. set your key
export COINIS_API_KEY="sk_live_..."

# 2. call the model
curl https://api.app.coinis.com/v1/llm/generate \
  -H "Authorization: Bearer $COINIS_API_KEY" \
  -d '{"prompt":"neon city, rain, tracking shot"}'
import { Coinis } from "@coinis/sdk";
const coinis = new Coinis(process.env.COINIS_API_KEY);

const job = await coinis.llm.generate({
  model: "models/alibaba/qwen",
  prompt: "neon city, rain, tracking shot",
});
from coinis import Coinis
coinis = Coinis(os.environ["COINIS_API_KEY"])

job = coinis.llm.generate(
    model="models/alibaba/qwen",
    prompt="neon city, rain, tracking shot",
)
Response
{
  "id": "gen_8fa2c1",
  "status": "succeeded",
  "model": "models/alibaba/qwen",
  "output": {
    "image_url": 
                "https://cdn.coinis.com/gen_8fa2c1.mp4"
              
              ,
    "format": "mp4"
  },
  "tokens_used": 10
}

Already on another provider's SDK? Change the host. Keep the call.

Pricing

Token pricing. No surprises.

One wallet across every model. No API accounts to juggle.

Qwen3 Max · IN
20.4 tokens
per 1m tokens · $2.04
Frontier LLM
$2.04 / 1m tokens
One key. Every model. One invoice. 1 token = $0.10
1 1m tokens ≈ 20 tokens ($2.04)
Start free. 15 tokens a week.

No credit card.

Why pay through Coinis
  • One wallet for every model. No API keys. No separate bills.
  • Generate ads. Launch to Meta. Track in one place.
  • On-brand output from your Brand Profile.

1 token = $0.10 pay-as-you-go. Less on a plan.

Standard vs Fast

Pick the run for the job.

IN

Final renders, studios
Resolution
Price $2.04 / 1m tokens

OUT

Rapid tests, high volume
Resolution
Price $10.2 / 1m tokens
Use cases

Two buyers. One model.

For builders

Resell every model. One key. One bill.

Unified API across video, image, audio, and LLM.

Generate 500 variants overnight.

Async queue plus webhooks. Batch at scale.

White-label the output.

Ship it under your brand. Outputs are yours.

For creatives

Ship a Reel before lunch.

Prompt to platform-native clip in minutes.

Same product. Ten formats.

One generation, every aspect ratio.

Commercial UGC without a creator.

Authentic selfie-style ads, on brand.

RAG over long documents Qwen3 Max is explicitly optimized for retrieval-augmented generation[^4]. Feed it a 200K-token corpus and get grounded, accurate answers without document chunking gymnastics.

Agentic tool-calling workflows The model supports OpenAI-style tools and tool_choice[^6], making it a direct drop-in for agent frameworks. Hermes Agent (40+ tools, self-improving) and OpenClaw (AI agent for messaging apps) both run on it in production[^7].

Code generation and development assistance Roo Code, a VS Code multi-agent coding extension, uses Qwen3 Max as a backend[^7]. It handles code review, refactoring, and generation across major languages.

Multilingual content and translation With 100+ language support and strong commonsense reasoning[^3], it handles cross-lingual Q&A, content localization, and translation at production scale.

STEM problem-solving Higher accuracy on math, logic, and science tasks[^2] makes it a reliable choice for STEM tutoring, research summarization, and quantitative data analysis.

Renders in seconds. Set a seed. Get the same frame back.

Outputs are yours. Sell them.

Safe for paid ads.

Your prompts are never used for training.

FAQ

Qwen3 Max FAQs

How much does Qwen3 Max cost on Coinis vs the Alibaba Cloud API?

On Coinis, Qwen3 Max is priced at $2.04 per 1M input tokens (variant qwen3-max-in) and $10.20 per 1M output tokens (variant qwen3-max-out) at the 0–32K context tier. Alibaba Cloud's direct list price for the same tier is $1.20/M in and $6.00/M out. Coinis adds unified billing across all models, so there is no separate provider account to manage.

Is there an API for Qwen3 Max, and is it OpenAI-compatible?

Yes. Send a POST request to https://api.app.coinis.com/v1/llm/generate with "model": "qwen3-max". The API accepts standard OpenAI-compatible parameters including tools, tool_choice, response_format, temperature, top_p, seed, and presence_penalty[^6]. Full schema is on the API sub-page.

What is the context window and maximum output length for Qwen3 Max?

The total context window is 262,144 tokens[^5]. Maximum output per request is 32,768 tokens[^5]. Note that Coinis retail pricing covers the 0–32K context tier. Prompts exceeding 32K tokens ramp to higher-cost tiers at the underlying API level.

What is the difference between Qwen3 Max and Qwen3 Max Thinking?

Qwen3 Max is the standard instruction-following variant. Qwen3 Max Thinking is a separate model with an explicit chain-of-thought reasoning mode that works through problems step by step before producing output[^11]. Use Qwen3 Max for fast, general-purpose generation. Use the Thinking variant when you need auditable reasoning traces or higher accuracy on hard multi-step problems.

Qwen vs DeepSeek. when should I pick Qwen3 Max over DeepSeek V3?

Choose Qwen3 Max when your workflow requires strong multilingual coverage across 100+ languages[^3], explicit RAG optimization[^4], or deep Chinese-language instruction following[^2]. DeepSeek V3 is a strong alternative for English-primary code and reasoning tasks and carries a lower output price. If your pipeline is multilingual or China-market-facing, Qwen3 Max is the better default.

Does Qwen3 Max support tool calling and streaming for agent workflows?

Yes. The model supports OpenAI-style tools and tool_choice parameters and streams via server-sent events[^6][^10]. Plan for a ~6.39% tool-call error rate on Alibaba Cloud Int.[^9] and implement retry or fallback logic in production agents.

How does pricing change for prompts above 32K or 128K tokens?

Coinis retail covers the 0–32K context tier at $2.04/M input and $10.20/M output. The underlying Alibaba Cloud API applies higher rates for the ≤128K and >128K tiers. See the official docs at alibabacloud.com/help/en/model-studio/models for the full tier schedule, and model your costs using the inline rate before sending large-context batches.

Start free

Your wallet. Every model. One call away.

Start free. 15 tokens a week. No card.

Generate on Coinis

No credit card.

Pricing and capabilities verified 2026-05-26. Read the docs .