🐧 PenguinPulse

Linux Graphics & Gaming News

Ollama v0.13.2-rc2 Boosts Vision Model AI Performance, Fixes Multi-GPU CUDA

Ollama has recently released version 0.13.2-rc2, bringing several updates focused on AI model performance and GPU compatibility. A significant change is the default enablement of flash attention for vision models, including mistral-3, gemma3, and qwen3-vl. This enhancement aims to optimize how these models process visual data. The release notes indicate this change "improves memory utilization and performance when providing images as input." The update also addresses a critical issue concerning GPU detection on multi-GPU CUDA machines, ensuring more reliable operation for systems with multiple NVIDIA GPUs. Additionally, a specific fix is included for the deepseek-v3.1 model, resolving an issue where it would consistently engage in "thinking" even when that feature was disabled within the Ollama application. These updates are intended to provide more robust and efficient local AI model execution.

Sources