Local OpenAI API.
Drop-in /v1/chat/completions. No proxy. No API key in someone's database.
If you've ever wired your editor, a script, or an internal tool to OpenAI's API and then realized you don't want every call to leave your machine, tailor.'s local endpoint is the answer. It speaks the same wire format as OpenAI's /v1/chat/completions, runs on localhost:11435, and serves whatever local model you've selected in the app.
How to use it
Point any OpenAI-compatible client at http://localhost:11435/v1 with any API key string (it's not validated , auth lives elsewhere since the server is loopback-only by default). Example with the Python SDK:
Example: Python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11435/v1",
api_key="not-used",
)
resp = client.chat.completions.create(
model="qwen2.5-coder:14b",
messages=[{"role": "user", "content": "Write a haiku about TCP."}],
)
print(resp.choices[0].message.content)
Example: Cursor
In Cursor settings, under "Models" → "Add custom model," set the base URL to http://localhost:11435/v1, pick any model name (tailor. routes to the active model regardless), and Cursor will use your local model for completions and chat.
Streaming
Server-sent events work the same way as OpenAI's. stream=true returns token-by-token chunks in the OpenAI delta format. Most clients work without changes.
Why this matters
Cloud LLM costs add up fast , a heavy Cursor user can spend $200+/month on Claude through Anthropic. A heavy script user can spend more. tailor. is $11/month flat for unlimited local inference. The privacy story is the same as the rest of the app: nothing leaves your device, no key is sitting in someone else's database, no rate limits.