Cloud-init to deploy LocalAI in the cloud in 5 minutes -

It’s possible to use generative AI without sharing your data with companies or states that raid our data.

The world of free software is full of applications for evaluating and using generative AI. After extensive testing, I present here the cloud-init file for deploying your own LocalAI instance in under 5 minutes.

Why LocalAI?

LocalAI is a free software application designed to offer a local, self-hosted alternative to AI service providers. The application features an API compatible with that of OpenAI. The idea is to be able to replace calls to OpenAI from any existing application in the blink of an eye: simply change the domain to which the API points.

All OpenAI functionalities are replicated: text completion, image generation (Dall-E), audio transcription (Whisper), chat with an AI assistant, including function calls, embeddings, …

However, it’s not OpenAI, Anthropic or Google templates that are used to perform these tasks: this is done via open templates, such as Meta’s Mistral or Llama, but the catalog of templates is immense.

LocalAI also provides a web interface for managing templates and testing each feature. But don’t expect a ChatGPT-style interface - that’s not the point here.

Choosing your machine

LocalAI works on any machine, from the Raspberry Pi to a server with a €20,000 graphics card.

However, you shouldn’t expect acceptable performance on small machines, and bear in mind that larger models, similar to those accessed through ChatGPT, Claude, … require a lot of memory. To get an idea of the specifications required, see this article.

When you don’t have hardware to dedicate to AI, the cloud can be a good way to evaluate performance and use cases.

I was able to test OVH’s various GPU machines, available from €0.70/hr to €2.75/hr, with :

NVIDIA L4
NVIDIA L40S
NVIDIA A100
NVIDIA H100

Depending on the graphics card, a different version of CUDA will be used. The above cards are all compatible with CUDA 12, but for other cards it may be necessary to adapt the version in the cloud-init script.

I recommend that you have at least 100 GB of disk space on your machine, so that you can evaluate many models without having to sort them out regularly.

The cloud-init script

cloud-init is a standard method for automatically configuring a new machine. Almost all cloud providers support it.

It’s usually a metadata passed to the machine when it starts up.

The script is given for a machine booted with Ubuntu 24.04.

All that remains to be done is to point the DNS record of the domain to the public IP of the machine created.

Machine without graphics card

Configuring a machine without a graphics card doesn’t present much of a challenge: simply launch the LocalAI Docker container:

#cloud-config
users:
  - default

packages:
  - docker.io

write_files:
  - content: |
      {
        acme_ca https://acme-staging-v02.api.letsencrypt.org/directory
      }

      ia.example.com {
        @my-ips not remote_ip 127.0.0.1/8
        basicauth @my-ips {
          localai "$2a$14$hfEBPQMe9dV9VaoZbHbOaOoseMaqrFC9nST/7n7oeNWkhEKmyaxNi"
        }

        reverse_proxy localai:8080 {
          flush_interval -1
        }
      }      
    path: /etc/caddy/Caddyfile

runcmd:
  # Allow traffic in IPv4
  - sed -i '/-A INPUT -j REJECT/i-A INPUT -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT\n-A INPUT -p tcp -m state --state NEW -m tcp --dport 443 -j ACCEPT' /etc/iptables/rules.v4
  - iptables -I INPUT 5 -p tcp -m state --state NEW -m tcp --dport 443 -j ACCEPT
  - iptables -I INPUT 5 -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT

  # Create docker network
  - docker network create local

  # Launch web server
  - docker run -d --restart unless-stopped --network local -v /etc/caddy:/etc/caddy -p 80:80 -p 443:443 --name caddy caddy:latest

  # Launch container
  - docker run -d --restart unless-stopped --network local -v "/var/lib/localai/models:/build/models:cached" --name localai localai/localai:latest-aio-cpu

Machine with graphics card

With a graphics card, the exercise is made more complex by the fact that you have to start by installing the kernel modules needed to drive the graphics card, as well as those required for Docker to assign the card to the container.

#cloud-config
users:
  - default

packages:
  - docker.io
  - nvidia-dkms-535-server
  - nvidia-utils-535-server

write_files:
  - content: |
      {
        acme_ca https://acme-staging-v02.api.letsencrypt.org/directory
      }

      ia.example.com {
        @my-ips not remote_ip 127.0.0.1/8
        basicauth @my-ips {
          localai "$2a$14$hfEBPQMe9dV9VaoZbHbOaOoseMaqrFC9nST/7n7oeNWkhEKmyaxNi"
        }

        reverse_proxy localai:8080 {
          flush_interval -1
        }
      }      
    path: /etc/caddy/Caddyfile

runcmd:
  # Download and install GPU controller for Docker
  - curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' > /etc/apt/sources.list.d/nvidia-container-toolkit.list
  - apt update && apt install -y nvidia-container-toolkit
  - nvidia-ctk runtime configure --runtime=docker
  - systemctl restart docker
  - sleep 5

  # Allow traffic in IPv4
  - sed -i '/-A INPUT -j REJECT/i-A INPUT -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT\n-A INPUT -p tcp -m state --state NEW -m tcp --dport 443 -j ACCEPT' /etc/iptables/rules.v4
  - iptables -I INPUT 5 -p tcp -m state --state NEW -m tcp --dport 443 -j ACCEPT
  - iptables -I INPUT 5 -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT

  # Create docker network
  - docker network create local

  # Launch web server
  - docker run -d --restart unless-stopped --network local -v /etc/caddy:/etc/caddy -p 80:80 -p 443:443 --name caddy caddy:latest

  # Launch container
  - docker run -d --restart unless-stopped --gpus all --network local -e DEBUG=true -v "/var/lib/localai/models:/build/models:cached" -p "8080:8080" --name localai --pull always localai/localai:latest-aio-gpu-nvidia-cuda-12

Script details

To keep things simple and make the service directly usable, I’ve used the web server caddy to expose and protect the LocalAI service. caddy will automatically request a certificate for the domain it is to serve, in this case ia.example.com, so remember to adapt this part of the configuration, and declare the server’s IP in your domain.

TLS certificate

Cloud environments are quick to create and destroy machines, and as there is no persistence here, you need to be vigilant because if the machine is recreated, new certificates will be requested, which can quickly lead to the number of certificates issued for the domain being exceeded.

In order to ensure healthy behavior, the script I’m proposing simply requests certificates from the Let’s Encrypt test instance. When you’ve finished testing, simply delete or comment out the acme_ca line in Caddy’s configuration.

Access restriction

Online services such as ChatGPT, Claude, … require authentication using an API key. There’s no such concept in LocalAI: any API key will be considered valid.

If your machine is exposed directly to the Internet, you need to protect access to LocalAI so that the API cannot be used by just anyone.

You need to adapt Caddy’s configuration to authorize your IPs, in the @my-ips list, or if you don’t have fixed IPs, create users for Basic authentication.

ai container hosting privacy