Local LLMs on Windows.
NVIDIA accelerated. CUDA out of the box. Your PC, your tokens.
Windows PCs with NVIDIA GPUs are the fastest consumer hardware for local LLM inference, period. A single RTX 4090 has 24 GB of VRAM and can run 30B-class models at speeds that match or beat cloud APIs. tailor. uses CUDA automatically , you don't need to install drivers separately, configure llama.cpp, or fight with Python environments.
Hardware recommendations
8 GB system RAM, integrated graphics: 1B–3B models, CPU-only inference. Slow but works. 16 GB RAM, GTX 1660+ or RTX 3050+ (6–8 GB VRAM): 7B models at 4-bit quantization. 32 GB RAM, RTX 3080 / 4070 (12 GB VRAM): 13B models comfortably, 30B at heavy quant. RTX 4090 (24 GB VRAM): 30B-class at high quality, 70B at heavy quantization. Dual RTX 4090 or RTX 6000 Ada: frontier open models at full precision.
AMD and Intel GPUs
Today tailor. ships accelerated builds for NVIDIA (CUDA) and Apple Silicon (Metal). On AMD and Intel hardware, inference currently runs on the CPU path , perfectly fine for small and mid-size models, slower for large ones. Vulkan and ROCm acceleration are on the roadmap.
Installation
Download the portable .exe. The portable build runs from any folder with no admin rights , useful on locked-down work laptops. On first launch, SmartScreen may show an "unrecognized app" warning , click "More info" → "Run anyway." EV code signing is on the near-term roadmap so this becomes a single-click open.
WSL and Docker
tailor. is a native Windows app and doesn't require WSL or Docker. If you already run llama.cpp or Ollama inside WSL, you can keep that setup and point tailor. at it , but most users find the native build simpler and faster.