Poly Logo

Polylabs

Free ToolsBlog
Qwen
Qwen

Qwen3 VL 32B Instruct

Updated: June 2026

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...

Specifications

Context
262K
Input
$0.104/M
Output
$0.416/M

Capabilities

VISIONTEXTWEBCODINGTHINKINGWRITING

Similarly Priced Models

ModelProviderContextInput PriceOutput Price
Qwen3.5-9B
QwenQwen
262K$0.1/M$0.15/M
ByteDance Seed: Seed-2.0-Mini
ByteDanceByteDance
262K$0.1/M$0.4/M
Qwen3.5 Flash
QwenQwen
1M$0.1/M$0.4/M
Ministral 3 3B 2512
MistralMistralAI
131K$0.1/M$0.1/M
Voxtral Small 24B 2507
MistralMistralAI
32K$0.1/M$0.3/M

Performance Metrics

Intelligence Index

01

11.1

> 22% OF MODELS

Coding Index

02

15.6

> 30% OF MODELS

Agentic Index

03

9.7

> 20% OF MODELS

Average Response Performance

Output Speed
66.7 tok/s
Time To First Token
2.66s
Time To First Answer Token
2.66s
End To End Response Time
10.15s

DATA SOURCE: Artificial Analysis

Curious about Qwen3 VL 32B Instruct?