Running local models on Macs gets faster with Ollama's MLX support

arstechnica.com • March 31, 2026 • technology

Key Points:

Ollama has added support for Apple’s open source MLX framework, improved caching performance, and introduced Nvidia’s NVFP4 model compression format, enhancing memory efficiency and performance on Macs with Apple Silicon chips.
These updates come amid growing interest in local large language models, driven by frustrations with cloud service rate limits and costs, with Ollama also expanding Visual Studio Code integration for local coding models.
The new MLX support is currently in preview (Ollama 0.19) and supports only Alibaba’s 35-billion-parameter Qwen3.5 model, requiring Macs with Apple Silicon and at least 32GB of RAM, with additional benefits for M5-series GPU users.
While local models still trail leading cloud models in benchmarks, they are becoming viable for certain tasks and offer privacy advantages, though setup complexity and hardware demands remain significant barriers.
Apple’s MLX optimizes shared CPU-GPU memory access on Macs, marking progress for local model performance on Apple hardware, but Ollama has not announced when broader MLX support or additional models will be available.

Trending Business