🐧 PenguinPulse

Linux Graphics & Gaming News

Ollama v0.21.1 Adds Kimi CLI, Enhances MLX Runner Performance and Features

Ollama released version 0.21.1 recently, introducing the Kimi CLI for local AI model execution. Users can now launch the Kimi CLI via Ollama, which "excels at long horizon agentic execution tasks through a multi-agent system" when paired with Kimi K2.6. This update also delivers multiple performance and feature enhancements to the MLX runner, a component for Apple Silicon. These include new logprobs support for compatible models and faster sampling through fused top-P and top-K operations, alongside repeat penalties. MLX prompt tokenization has been improved by moving it into request handler goroutines, and array management now features better thread safety. Additionally, the GLM4 MoE Lite model sees performance improvements with a fused sigmoid router head. Minor bug fixes for the macOS application and Gemma 4 structured outputs are also included.

Sources