← back
The Future of Qwen: A Generalist Agent Model — Junyang Lin, Alibaba Qwen
Original: The Future of Qwen: A Generalist Agent Model — Junyang Lin, Alibaba Qwen
Takeaway
Qwen 3 fuses thinking and non-thinking into one model with a tunable thinking budget, multilingual coverage and MoE architecture aimed at becoming a generalist agent.
Summary
- Junyang Lin (Alibaba Qwen) recaps Qwen 2.5-Max matching Claude 3.5/GPT-4o/DeepSeek-V3, then introduces Qwen 3 — including a 235B-total / 22B-active MoE flagship competitive with o3-mini and trailing Gemini 2.5 Pro.
- Qwen 3 ships multiple sizes: a 30B/3B-active MoE that beats Qwen-32B on some tasks, and a distilled 4B parameter model deployable on mobile that rivals Qwen2.5-72B in reasoning.
- Headline feature is 'hybrid thinking mode' — one model toggles between thinking and non-thinking via prompts/hyperparameters, with a dynamic thinking budget (AIME 2024 score rises from ~40 at small budgets to 80+ at 32k tokens).
- Multilingual support jumps from 29 languages (Qwen 2.5) to 119+ languages and dialects, plus deeper agent/coding/MCP integration so the model can interleave tool calls with reasoning during inference.
- RL on math/coding consistently lifts a 32B model from ~65 to ~80 on AIME 2024; commits to MoE as the dominant future direction for Qwen.
qwenfoundation-modelsmoe
Original description
Since Alibaba launched the Qwen series of large models in 2023, the Qwen series of large language models and multimodal large models have been continuously updated and improved. This presentation will introduce the latest developments in the Qwen series of models, including the large language model Qwen3, vision-language large model Qwen2.5-VL, omni model Qwen2.5-Omni, etc. Additionally, this presentation will also cover the future development directions of the Qwen series.