Local LLMs on Mac.
Apple Silicon native. Metal-accelerated. Nothing leaves your machine.
Apple Silicon is, accidentally, the best consumer hardware for local LLMs. Unified memory means your "VRAM" and your system memory are the same pool , a 32 GB M-series Mac can comfortably run 30B-class models that would require a $1500 GPU on a PC. tailor. is built around this: native Apple Silicon binary, Metal-accelerated inference via llama.cpp, no Rosetta, no virtualization.
What runs well on which Mac
M1 / M2 / M3 / M4 with 8 GB unified memory: 1B–3B models (Llama 3.2 1B/3B, Qwen 2.5 1.5B/3B, Gemma 2 2B). Fast, low battery drain, great for everyday work. M-series with 16 GB: 7B–13B models (Llama 3.1 8B, Mistral 7B, Qwen 2.5 7B/14B at 4-bit). The sweet spot for most users. M-series with 32 GB+: 30B-class models (Qwen 2.5 32B, Llama 3.1 70B at heavy quantization). GPT-4-class output on a laptop. M-series with 64 GB+: 70B class at higher quantization, Mixtral 8x22B, frontier open models.
Intel Macs
tailor. ships an Intel build too. Small and mid-size models run fine on a recent Intel Mac with 16 GB+ RAM, though without Metal acceleration inference is slower than on Apple Silicon. The same app, same features, same privacy guarantee.
Battery life
Inference is GPU-heavy, so running a large model continuously will drain a laptop battery. tailor. is aware of this , it goes idle aggressively when the chat isn't active and offers a "battery saver" mode that caps token rate to extend runtime.
Installation
Download the .dmg from the homepage, drag to Applications, open. On first launch, right-click → Open to bypass the unidentified-developer warning (Apple Developer notarization is on the near-term roadmap so this becomes a single-click open). First launch downloads a starter model so you have something to chat with immediately.