🐧 PenguinPulse

Linux Graphics & Gaming News

Ollama v0.22.1-rc1 Adds NVIDIA TensorRT Import, Enhanced Model Batching

Ollama has released v0.22.1-rc1, bringing notable enhancements for local AI model execution. Key updates include the addition of NVIDIA TensorRT Model Optimizer import support, which allows for leveraging NVIDIA's high-performance inference SDK to optimize model execution on compatible GPUs. The release also introduces significant improvements in batching, with the mlxrunner now able to batch samplers across multiple sequences and general model support for batching, enhancing efficiency and throughput, particularly when processing multiple requests. Further fixes address a multi-regex BPE offset handling issue in the tokenizer and resolve a bug where the desktop application startup would terminate active ollama launch sessions. This release also includes support for new models.

Sources