From 0.8B to 397B parameters, Alibaba's model family covers every hardware tier.
Alibaba's Qwen 3.5 family spans from tiny edge models (0.8B parameters) to a 397B-parameter flagship that competes with proprietary models. All released under Apache 2.0. It's the most comprehensive open-weight model family available, covering every hardware tier from smartphones to GPU clusters. Here's where each size class fits.
Qwen 3.5 comes in two sub-families:
The flagship: Qwen3.5 397B A17B — a mixture-of-experts model with 397B total parameters and 17B active. This is the open-weight intelligence leader at 45.0.
The Small series: Four dense models at 0.8B, 2B, 4B, and 9B parameters. These are designed for edge deployment — phones, laptops, embedded devices — and use a hybrid Gated Delta + MoE architecture.
Both sub-families are natively multimodal (text + vision), support 262K context windows, and include configurable reasoning intensity. The Small series achieving 81.7% on GPQA Diamond is particularly impressive for models that run on a smartphone.
At 45.0 intelligence and 41.3 coding, the flagship Qwen3.5 is competitive with proprietary models from a year ago. It's 12 points behind GPT-5.4 (57.2) but significantly better than most open alternatives.
The MoE architecture means only 17B parameters activate per token, keeping inference efficient despite the large total parameter count. You still need multi-GPU hardware to run it (4-8 GPUs), but throughput per GPU is reasonable.
Reasoning mode can be dialed up for complex tasks and down for simple ones, giving you control over the speed-quality tradeoff on a per-request basis.
The Small series is where Qwen 3.5 is most innovative. The 9B model scores 81.7% on GPQA Diamond — a graduate-level reasoning benchmark — while being small enough to run on consumer hardware. The 4B model runs on smartphones. The 0.8B model runs on embedded devices.
This opens use cases that cloud models can't serve: offline operation, privacy-preserving inference, real-time mobile AI, and edge computing. For applications that can't tolerate API latency or need to work without internet access, the Small series is the best option available.
The 262K context window is unusually large for small models, allowing them to process substantial documents entirely on-device.
Qwen 3.5 supports 200+ languages — more than any other model family. For international applications, this is a significant advantage. The quality in non-English languages is notably better than Western models, which tend to degrade on less-common languages.
For companies serving global audiences, Qwen's multilingual capability reduces the need for separate models or translation layers for different markets.
All Qwen 3.5 models tested through official weights. GPQA Diamond and other benchmark scores from published evaluations and Artificial Analysis. Edge deployment tested on recommended hardware configurations.
Qwen 3.5 is the most versatile open-weight family available. The 397B flagship leads on intelligence, the Small series enables edge deployment, and Apache 2.0 licensing removes commercial barriers. For teams building open-source AI products, this is the default starting point.
Published June 11, 2026. Data updated daily from independent benchmarks and API providers.