AI Agent Reachability

The problem

AI agents and large language models often run locally. GPU resources are expensive in the cloud, sensitive data should not leave your network, and iteration is faster when the model is on your own hardware. But cloud orchestrators and tool-use frameworks like LangChain, CrewAI, and AutoGen need an HTTP endpoint they can call back to. Your local inference server is invisible to the internet.

Traditional workarounds are all painful. Port forwarding is fragile, requires router access, and breaks under carrier-grade NAT. Deploying to the cloud means paying for GPU instances and accepting latency you do not need. VPNs add configuration complexity on every client. Airdress gives your local inference server a stable public URL in minutes, with no router changes and no cloud compute bill.

Architecture

flowchart LR
    A[Cloud Orchestrator] -->|HTTPS| B[Relay PoP]
    B -->|WireGuard| C[Operator]
    C --> D[Local LLM / Ollama]

The cloud orchestrator sends HTTP requests to your *.a.airdr.es name. DNS resolves to the relay VIP. The relay forwards traffic through a WireGuard tunnel to the operator on your machine, which hands it to your local inference server.

Walkthrough

Install the operator
Terminal window
```
curl -fsSL https://get.airdress.co/operator | sh
```
Confirm the binary is available:
Terminal window
```
airdress-operator --version
```
Start your local inference server

Start Ollama (or any inference server) on its default port:
Terminal window
```
ollama serve
```
Confirm it is responding locally:
Terminal window
```
curl http://127.0.0.1:11434/api/tags
```
You should see a JSON response listing available models. Leave this terminal running.

Start the operator

In a second terminal, start the operator:

airdress-operator serve --bind 0.0.0.0:8080

Wait for the tunnel to establish:

INFO  wireguard handshake complete  relay=ewr  latency=18ms
INFO  operator ready  bind=0.0.0.0:8080  wg_port=51820

Verify the inference server responds through the tunnel

From any machine with internet access, send a request to your airdress name:
Terminal window
```
curl https://your-name.a.airdr.es/api/tags
```
You should see the same JSON response you saw locally. The request traveled through the relay PoP, down the WireGuard tunnel, and reached Ollama on your machine.
Configure your cloud orchestrator

Point your orchestrator at the airdress URL as its inference endpoint. For example, with a LangChain remote LLM:
```
from langchain_community.llms import Ollama

llm = Ollama(
    base_url="https://your-name.a.airdr.es",
    model="llama3",
)
```
The orchestrator can now call your local model over the public internet, no VPN or port forwarding required.

Production considerations

Add authentication. The operator supports OIDC, or you can place a reverse proxy (Caddy, nginx) in front of the inference server to enforce API keys or bearer tokens.
Rate limiting. Configure proxy.rate_limit_rps in the operator config file to prevent runaway orchestrator loops from saturating your GPU.
Monitoring. Run airdress-operator status to check tunnel health, handshake freshness, and bytes transferred.
One name per agent. If you run multiple models or agent runtimes, claim a dedicated airdress name for each. This isolates traffic and simplifies monitoring.
Bandwidth. All traffic flows through the relay. This is well-suited for API calls (prompts and completions), but not for bulk model downloads. Pull model weights directly, not through the tunnel.

Next steps

Quickstart Install the operator and confirm it is running in under five minutes.

Configuration Fine-tune the operator with a TOML config file and environment variables.

Deployment Checklist Production readiness checklist for running the operator as a long-lived service.

Reachability is the missing piece of personal software Why reachability matters for AI agents and local-first software — on the Airdress blog.