Ollama v0.13.2-rc2 Boosts Vision Model AI Performance, Fixes Multi-GPU CUDA

December 07, 2025

ollama ai gpu compute linux cuda vision models

Ollama has recently released version 0.13.2-rc2, bringing several updates focused on AI model performance and GPU compatibility. A significant change is the default enablement of flash attention for vision models, including mistral-3, gemma3, and qwen3-vl. This enhancement aims to optimize how these models process visual data. The release notes indicate this change "improves memory utilization and performance when providing images as input." The update also addresses a critical issue concerning GPU detection on multi-GPU CUDA machines, ensuring more reliable operation for systems with multiple NVIDIA GPUs. Additionally, a specific fix is included for the deepseek-v3.1 model, resolving an issue where it would consistently engage in "thinking" even when that feature was disabled within the Ollama application. These updates are intended to provide more robust and efficient local AI model execution.

Sources

v0.13.2 - GitHub: ollama/ollama

Ollama v0.13.2-rc2 Boosts Vision Model AI Performance, Fixes Multi-GPU CUDA

Sources

Stay Updated