Skip to content

AI Agent Reachability

AI agents and large language models often run locally. GPU resources are expensive in the cloud, sensitive data should not leave your network, and iteration is faster when the model is on your own hardware. But cloud orchestrators and tool-use frameworks like LangChain, CrewAI, and AutoGen need an HTTP endpoint they can call back to. Your local inference server is invisible to the internet.

Traditional workarounds are all painful. Port forwarding is fragile, requires router access, and breaks under carrier-grade NAT. Deploying to the cloud means paying for GPU instances and accepting latency you do not need. VPNs add configuration complexity on every client. Airdress gives your local inference server a stable public URL in minutes, with no router changes and no cloud compute bill.

flowchart LR
    A[Cloud Orchestrator] -->|HTTPS| B[Relay PoP]
    B -->|WireGuard| C[Operator]
    C --> D[Local LLM / Ollama]

The cloud orchestrator sends HTTP requests to your *.a.airdr.es name. DNS resolves to the relay VIP. The relay forwards traffic through a WireGuard tunnel to the operator on your machine, which hands it to your local inference server.

  1. Install the operator

    Terminal window
    curl -fsSL https://get.airdress.co/operator | sh

    Confirm the binary is available:

    Terminal window
    airdress-operator --version
  2. Start your local inference server

    Start Ollama (or any inference server) on its default port:

    Terminal window
    ollama serve

    Confirm it is responding locally:

    Terminal window
    curl http://127.0.0.1:11434/api/tags

    You should see a JSON response listing available models. Leave this terminal running.

  3. Start the operator

    In a second terminal, start the operator:

    Terminal window
    airdress-operator serve --bind 0.0.0.0:8080

    Wait for the tunnel to establish:

    INFO wireguard handshake complete relay=ewr latency=18ms
    INFO operator ready bind=0.0.0.0:8080 wg_port=51820
  4. Verify the inference server responds through the tunnel

    From any machine with internet access, send a request to your airdress name:

    Terminal window
    curl https://your-name.a.airdr.es/api/tags

    You should see the same JSON response you saw locally. The request traveled through the relay PoP, down the WireGuard tunnel, and reached Ollama on your machine.

  5. Configure your cloud orchestrator

    Point your orchestrator at the airdress URL as its inference endpoint. For example, with a LangChain remote LLM:

    from langchain_community.llms import Ollama
    llm = Ollama(
    base_url="https://your-name.a.airdr.es",
    model="llama3",
    )

    The orchestrator can now call your local model over the public internet, no VPN or port forwarding required.

  • Add authentication. The operator supports OIDC, or you can place a reverse proxy (Caddy, nginx) in front of the inference server to enforce API keys or bearer tokens.
  • Rate limiting. Configure proxy.rate_limit_rps in the operator config file to prevent runaway orchestrator loops from saturating your GPU.
  • Monitoring. Run airdress-operator status to check tunnel health, handshake freshness, and bytes transferred.
  • One name per agent. If you run multiple models or agent runtimes, claim a dedicated airdress name for each. This isolates traffic and simplifies monitoring.
  • Bandwidth. All traffic flows through the relay. This is well-suited for API calls (prompts and completions), but not for bulk model downloads. Pull model weights directly, not through the tunnel.