AI Agent Reachability
The problem
Section titled “The problem”AI agents and large language models often run locally. GPU resources are expensive in the cloud, sensitive data should not leave your network, and iteration is faster when the model is on your own hardware. But cloud orchestrators and tool-use frameworks like LangChain, CrewAI, and AutoGen need an HTTP endpoint they can call back to. Your local inference server is invisible to the internet.
Traditional workarounds are all painful. Port forwarding is fragile, requires router access, and breaks under carrier-grade NAT. Deploying to the cloud means paying for GPU instances and accepting latency you do not need. VPNs add configuration complexity on every client. Airdress gives your local inference server a stable public URL in minutes, with no router changes and no cloud compute bill.
Architecture
Section titled “Architecture”flowchart LR
A[Cloud Orchestrator] -->|HTTPS| B[Relay PoP]
B -->|WireGuard| C[Operator]
C --> D[Local LLM / Ollama]
The cloud orchestrator sends HTTP requests to your *.a.airdr.es name.
DNS resolves to the relay VIP. The relay forwards traffic through a
WireGuard tunnel to the operator on your machine, which hands it to your
local inference server.
Walkthrough
Section titled “Walkthrough”-
Install the operator
Terminal window curl -fsSL https://get.airdress.co/operator | shConfirm the binary is available:
Terminal window airdress-operator --version -
Start your local inference server
Start Ollama (or any inference server) on its default port:
Terminal window ollama serveConfirm it is responding locally:
Terminal window curl http://127.0.0.1:11434/api/tagsYou should see a JSON response listing available models. Leave this terminal running.
-
Start the operator
In a second terminal, start the operator:
Terminal window airdress-operator serve --bind 0.0.0.0:8080Wait for the tunnel to establish:
INFO wireguard handshake complete relay=ewr latency=18msINFO operator ready bind=0.0.0.0:8080 wg_port=51820 -
Verify the inference server responds through the tunnel
From any machine with internet access, send a request to your airdress name:
Terminal window curl https://your-name.a.airdr.es/api/tagsYou should see the same JSON response you saw locally. The request traveled through the relay PoP, down the WireGuard tunnel, and reached Ollama on your machine.
-
Configure your cloud orchestrator
Point your orchestrator at the airdress URL as its inference endpoint. For example, with a LangChain remote LLM:
from langchain_community.llms import Ollamallm = Ollama(base_url="https://your-name.a.airdr.es",model="llama3",)The orchestrator can now call your local model over the public internet, no VPN or port forwarding required.
Production considerations
Section titled “Production considerations”- Add authentication. The operator supports OIDC, or you can place a reverse proxy (Caddy, nginx) in front of the inference server to enforce API keys or bearer tokens.
- Rate limiting. Configure
proxy.rate_limit_rpsin the operator config file to prevent runaway orchestrator loops from saturating your GPU. - Monitoring. Run
airdress-operator statusto check tunnel health, handshake freshness, and bytes transferred. - One name per agent. If you run multiple models or agent runtimes, claim a dedicated airdress name for each. This isolates traffic and simplifies monitoring.
- Bandwidth. All traffic flows through the relay. This is well-suited for API calls (prompts and completions), but not for bulk model downloads. Pull model weights directly, not through the tunnel.