🐧 PenguinPulse

Linux Graphics & Gaming News

Ollama v0.20.0 Introduces Gemma 4 Models; v0.20.4-rc Improves MLX & Flash Attention

Ollama recently released version 0.20.0, which introduces support for the Gemma 4 series of large language models. This includes "Effective 2B (E2B)", "Effective 4B (E4B)", "26B (Mixture of Experts model with 4B active parameters)", and "31B (Dense)" variants. The update also adds SentencePiece-style BPE support for tokenization. Following this, a release candidate for version 0.20.4, tagged as v0.20.4-rc2, was made available. This release candidate includes performance enhancements such as improved M5 performance with NAX for MLX, and it enables flash attention specifically for Gemma 4 models. These updates continue Ollama's focus on local AI development by expanding model support and optimizing inference performance.

Sources