Ollama v0.20.0 Introduces Gemma 4 Models; v0.20.4-rc Improves MLX & Flash Attention

April 08, 2026

ollama ai development software local ai gemma ml

Ollama recently released version 0.20.0, which introduces support for the Gemma 4 series of large language models. This includes "Effective 2B (E2B)", "Effective 4B (E4B)", "26B (Mixture of Experts model with 4B active parameters)", and "31B (Dense)" variants. The update also adds SentencePiece-style BPE support for tokenization. Following this, a release candidate for version 0.20.4, tagged as v0.20.4-rc2, was made available. This release candidate includes performance enhancements such as improved M5 performance with NAX for MLX, and it enables flash attention specifically for Gemma 4 models. These updates continue Ollama's focus on local AI development by expanding model support and optimizing inference performance.

Sources

v0.20.4 - GitHub: ollama/ollama
v0.20.0 - GitHub: ollama/ollama

Ollama v0.20.0 Introduces Gemma 4 Models; v0.20.4-rc Improves MLX & Flash Attention

Sources

Stay Updated