Ollama v0.23.1 Accelerates Gemma 4 MLX Inference with MTP Speculative Decoding

May 06, 2026

ollama ai llm macos mlx gemma performance compute

Ollama has released version 0.23.1, introducing Gemma 4 Multi-token Processing (MTP) speculative decoding for its MLX runner. This update, primarily benefiting macOS users with Apple Silicon, can provide "over a 2x speed increase for the Gemma 4 31B model on coding tasks." The speculative decoding feature enhances the efficiency of token generation, contributing to the reported performance gains. The release also includes updates to MLX and MLX-C to address threading issues, along with an upgrade to Go 1.26. These enhancements aim to improve the speed and stability of running large language models locally.

Sources

v0.23.1 - GitHub: ollama/ollama

Ollama v0.23.1 Accelerates Gemma 4 MLX Inference with MTP Speculative Decoding

Sources

Stay Updated