Ollama v0.17.5 Adds Qwen 3.5 Models, Fixes GPU/CPU Split & MLX Crashes

March 05, 2026

Ollama v0.17.5 was released today, introducing support for the Qwen 3.5 small model series, now available in 0.8B, 2B, 4B, and 9B parameter sizes. This update addresses several critical issues, including a crash in Qwen 3.5 models when their processing was split between the GPU and CPU. It also resolves a problem where Qwen 3.5 models would repeat themselves "due to no presence penalty," noting that users may need to re-download these specific models. Further stability improvements target the MLX engine, fixing memory issues and crashes. The release also ensures Ollama can now properly run models imported from Qwen 3.5 GGUF files. Additionally, the "ollama run --verbose" command will now display peak memory usage when utilizing Ollama's MLX engine.

Sources

v0.17.5 - GitHub: ollama/ollama

Ollama v0.17.5 Adds Qwen 3.5 Models, Fixes GPU/CPU Split & MLX Crashes

Sources

Stay Updated