Ollama v0.21.1 Adds Kimi CLI, Enhances MLX Runner Performance and Features

April 26, 2026

ollama ai development software local ai mlx

Ollama released version 0.21.1 recently, introducing the Kimi CLI for local AI model execution. Users can now launch the Kimi CLI via Ollama, which "excels at long horizon agentic execution tasks through a multi-agent system" when paired with Kimi K2.6. This update also delivers multiple performance and feature enhancements to the MLX runner, a component for Apple Silicon. These include new logprobs support for compatible models and faster sampling through fused top-P and top-K operations, alongside repeat penalties. MLX prompt tokenization has been improved by moving it into request handler goroutines, and array management now features better thread safety. Additionally, the GLM4 MoE Lite model sees performance improvements with a fused sigmoid router head. Minor bug fixes for the macOS application and Gemma 4 structured outputs are also included.

Sources

v0.21.1 - GitHub: ollama/ollama

Ollama v0.21.1 Adds Kimi CLI, Enhances MLX Runner Performance and Features

Sources

Stay Updated