Home

Local LLMs on Mac.

Apple Silicon native. Metal-accelerated. Nothing leaves your machine.

Apple Silicon is, accidentally, the best consumer hardware for local LLMs. Unified memory means your "VRAM" and your system memory are the same pool , a 32 GB M-series Mac can comfortably run 30B-class models that would require a $1500 GPU on a PC. tailor. is built around this: native Apple Silicon binary, Metal-accelerated inference via llama.cpp, no Rosetta, no virtualization.

What runs well on which Mac

M1 / M2 / M3 / M4 with 8 GB unified memory: 1B–3B models (Llama 3.2 1B/3B, Qwen 2.5 1.5B/3B, Gemma 2 2B). Fast, low battery drain, great for everyday work. M-series with 16 GB: 7B–13B models (Llama 3.1 8B, Mistral 7B, Qwen 2.5 7B/14B at 4-bit). The sweet spot for most users. M-series with 32 GB+: 30B-class models (Qwen 2.5 32B, Llama 3.1 70B at heavy quantization). GPT-4-class output on a laptop. M-series with 64 GB+: 70B class at higher quantization, Mixtral 8x22B, frontier open models.

Intel Macs

tailor. ships an Intel build too. Small and mid-size models run fine on a recent Intel Mac with 16 GB+ RAM, though without Metal acceleration inference is slower than on Apple Silicon. The same app, same features, same privacy guarantee.

Battery life

Inference is GPU-heavy, so running a large model continuously will drain a laptop battery. tailor. is aware of this , it goes idle aggressively when the chat isn't active and offers a "battery saver" mode that caps token rate to extend runtime.

Installation

Download the .dmg from the homepage, drag to Applications, open. On first launch, right-click → Open to bypass the unidentified-developer warning (Apple Developer notarization is on the near-term roadmap so this becomes a single-click open). First launch downloads a starter model so you have something to chat with immediately.

Questions

Does tailor. support the new M4 Macs?
Yes, native Apple Silicon binary works on M1, M2, M3, M4, and any future Apple Silicon hardware. The Metal backend takes advantage of new GPU cores automatically.
How much disk space do models take?
Small models: 1–3 GB. Mid-size: 4–8 GB. Large (30B+): 15–40 GB. You can delete models at any time from the app. The starter model is ~2 GB.
Will running a model heat my Mac?
Yes , it's GPU-heavy work. On a MacBook with active workloads, the fan will spin up. On Mac Studio or iMac, it's barely noticeable. The app shows live GPU/CPU utilization so you can see what's happening.
Try tailor. free for 7 days.
Full access. No credit card required. Mac, Windows, and Linux.