lxd-containers-for-human-beings.md

  1---
  2title: "LXD: Containers for Human Beings"
  3subtitle: "Docker's great and all, but I prefer the workflow of interacting with VMs"
  4date: 2023-08-11T16:30:00-04:00
  5categories:
  6  - Technology
  7tags:
  8  - Sysadmin
  9  - Containers
 10  - VMs
 11  - Docker
 12  - LXD
 13draft: true
 14toc: true
 15rss_only: false
 16cover: ./cover.png
 17---
 18
 19This is a blog post version of a talk I presented at both Ubuntu Summit 2022 and
 20SouthEast LinuxFest 2023. The first was not recorded, but the second was and is
 21on [SELF's PeerTube instance.][selfpeertube] I apologise for the terrible audio,
 22but there's unfortunately nothing I can do about that. If you're already
 23intimately familiar with the core concepts of VMs or containers, I would suggest
 24skipping those respective sections. If you're vaguely familiar with either, I
 25would recommend reading them because I do go a little bit in-depth.
 26
 27[selfpeertube]: https://peertube.linuxrocks.online/w/hjiTPHVwGz4hy9n3cUL1mq?start=1m
 28
 29{{< adm type="warn" >}}
 30
 31**Note:** Canonical has decided to [pull LXD out][lxd] from under the Linux
 32Containers entity and instead continue development under the Canonical brand.
 33The majority of the LXD creators and developers have congregated around a fork
 34called [Incus.][inc] I'll be keeping a close eye on the project and intend to
 35migrate as soon as there's an installable release.
 36
 37[lxd]: https://linuxcontainers.org/lxd/
 38[inc]: https://linuxcontainers.org/incus/
 39
 40{{< /adm >}}
 41
 42## The benefits of VMs and containers
 43
 44- **Isolation:** you don't want to allow an attacker to infiltrate your email
 45  server through your web application; the two should be completely separate
 46  from each other and VMs/containers provide strong isolation guarantees.
 47- **Flexibility:** <abbr title="Virtual Machines">VMs</abbr> and containers only
 48  use the resources they've been given. If you tell the VM it has 200 MBs of
 49  RAM, it's going to make do with 200 MBs of RAM and the kernel's <abbr
 50  title="Out Of Memory">OOM</abbr> killer is going to have a fun time 🤠
 51- **Portability:** once set up and configured, VMs and containers can mostly be
 52  treated as closed boxes; as long as the surrounding environment of the new
 53  host is similar to the previous in terms of communication (proxies, web
 54  servers, etc.), they can just be picked up and dropped between various hosts
 55  as necessary.
 56- **Density:** applications are usually much lighter than the systems they're
 57  running on, so it makes sense to run many applications on one system. VMs and
 58  containers facilitate that without sacrificing security.
 59- **Cleanliness:** VMs and containers are applications in black boxes. When
 60  you're done with the box, you can just throw it away and most everything
 61  related to the application is gone.
 62
 63## Virtual machines
 64
 65As the name suggests, Virtual Machines are all virtual; a hypervisor creates
 66virtual disks for storage, virtual <abbr title="Central Processing
 67Units">CPUs</abbr>, virtual <abbr title="Network Interface Cards">NICs</abbr>,
 68virtual <abbr title="Random Access Memory">RAM</abbr>, etc. On top of the
 69virtualised hardware, you have your kernel. This is what facilitates
 70communication between the operating system and the (virtual) hardware. Above
 71that is the operating system and all your applications.
 72
 73At this point, the stack is quite large; VMs aren't exactly lightweight, and
 74this impacts how densely you can pack the host.
 75
 76I mentioned a "hypervisor" a minute ago. I've explained what hypervisors in
 77general do, but there are actually two different kinds of hypervisor. They're
 78creatively named **Type 1** and **Type 2**.
 79
 80### Type 1 hypervisors
 81
 82These run directly in the host kernel without an intermediary OS. A good example
 83would be [KVM,][kvm] a **VM** hypervisor than runs in the **K**ernel. Type 1
 84hypervisors can communicate directly with the host's hardware to allocate RAM,
 85issue instructions to the CPU, etc.
 86
 87[debian]: https://debian.org
 88[kvm]: https://www.linux-kvm.org
 89[vb]: https://www.virtualbox.org/
 90
 91```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
 92hk: Host kernel
 93hk.h: Type 1 hypervisor
 94hk.h.k1: Guest kernel
 95hk.h.k2: Guest kernel
 96hk.h.k3: Guest kernel
 97hk.h.k1.os1: Guest OS
 98hk.h.k2.os2: Guest OS
 99hk.h.k3.os3: Guest OS
100hk.h.k1.os1.app1: Many apps
101hk.h.k2.os2.app2: Many apps
102hk.h.k3.os3.app3: Many apps
103```
104
105### Type 2 hypervisors
106
107These run in userspace as an application, like [VirtualBox.][vb] Type 2
108hypervisors have to first go through the operating system, adding an additional
109layer to the stack.
110
111```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
112hk: Host kernel
113hk.os: Host OS
114hk.os.h: Type 2 hypervisor
115hk.os.h.k1: Guest kernel
116hk.os.h.k2: Guest kernel
117hk.os.h.k3: Guest kernel
118hk.os.h.k1.os1: Guest OS
119hk.os.h.k2.os2: Guest OS
120hk.os.h.k3.os3: Guest OS
121hk.os.h.k1.os1.app1: Many apps
122hk.os.h.k2.os2.app2: Many apps
123hk.os.h.k3.os3.app3: Many apps
124```
125
126## Containers
127
128VMs use virtualisation to achieve isolation. Containers use **namespaces** and
129**cgroups**, technologies pioneered in the Linux kernel. By now, though, there
130are [equivalents for Windows] and possibly other platforms.
131
132[equivalents for Windows]: https://learn.microsoft.com/en-us/virtualization/community/team-blog/2017/20170127-introducing-the-host-compute-service-hcs
133
134**[Linux namespaces]** partition kernel resources like process IDs, hostnames,
135user IDs, directory hierarchies, network access, etc. This prevents one
136collection of processes from seeing or gaining access to data regarding another
137collection of processes.
138
139**[Cgroups]** limit, track, and isolate the hardware resource use of a
140collection of processes. If you tell a cgroup that it's only allowed to spawn
141500 child processes and someone executes a fork bomb, the fork bomb will expand
142until it hits that limit. The kernel will prevent it from spawning further
143children and you'll have to resolve the issue the same way you would with VMs:
144delete and re-create it, restore from a good backup, etc. You can also limit CPU
145use, the number of CPU cores it can access, RAM, disk use, and so on.
146
147[Linux namespaces]: https://en.wikipedia.org/wiki/Linux_namespaces
148[Cgroups]: https://en.wikipedia.org/wiki/Cgroups
149
150### Application containers
151
152The most well-known example of application container tech is probably
153[Docker.][docker] The goal here is to run a single application as minimally as
154possible inside each container. In the case of a single, statically-linked Go
155binary, a minimal Docker container might contain nothing more than the binary.
156If it's a Python application, you're more likely to use an [Alpine Linux image]
157and add your Python dependencies on top of that. If a database is required, that
158goes in a separate container. If you've got a web server to handle TLS
159termination and proxy your application, that's a third container. One cohesive
160system might require many Docker containers to function as intended.
161
162[docker]: https://docker.com/
163[Alpine Linux image]: https://hub.docker.com/_/alpine
164
165```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
166Host kernel.Container runtime.c1: Container
167Host kernel.Container runtime.c2: Container
168Host kernel.Container runtime.c3: Container
169
170Host kernel.Container runtime.c1.One app
171Host kernel.Container runtime.c2.Few apps
172Host kernel.Container runtime.c3.Full OS.Many apps
173```
174
175### System containers
176
177One of the most well-known examples of system container tech is the subject of
178this post: LXD! Rather than containing a single application or a very small set
179of them, system containers are designed to house entire operating systems, like
180[Debian] or [Rocky Linux,][rocky] along with everything required for your
181application. Using our examples from above, a single statically-linked Go binary
182might run in a full Debian container, just like the Python application might.
183The database and webserver might go in _that same_ container.
184
185[Debian]: https://www.debian.org/
186[rocky]: https://rockylinux.org/
187
188You treat each container more like you would a VM, but you get the performance
189benefit of _not_ virtualising everything. Containers are _much_ lighter than any
190virtual machine.
191
192```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
193hk: Host kernel
194hk.c1: Container
195hk.c2: Container
196hk.c3: Container
197hk.c1.os1: Full OS
198hk.c2.os2: Full OS
199hk.c3.os3: Full OS
200hk.c1.os1.app1: Many apps
201hk.c2.os2.app2: Many apps
202hk.c3.os3.app3: Many apps
203```
204
205## When to use which
206
207These are personal opinions. Please evaluate each technology and determine for
208yourself whether it's a suitable fit for your environment.
209
210### VMs
211
212As far as I'm aware, VMs are your only option when you want to work with
213esoteric hardware or hardware you don't physically have on-hand. You can tell
214your VM that it's running with RAM that's 20 years old, a still-in-development
215RISC-V CPU, and a 420p monitor. That's not possible with containers. VMs are
216also your only option when you want to work with foreign operating systems:
217running Linux on Windows, Windows on Linux, or OpenBSD on a Mac all require
218virtualisation. Another reason to stick with VMs is for compliance purposes.
219Containers are still very new and some regulatory bodies require virtualisation
220because it's a decades-old and battle-tested isolation technique.
221
222{{< adm type="note" >}}
223See Drew DeVault's blog post [_In praise of qemu_][qemu] for a great use of VMs
224
225[qemu]: https://drewdevault.com/2022/09/02/2022-09-02-In-praise-of-qemu.html
226
227{{< /adm >}}
228
229### Application containers
230
231Application containers are particularly popular for [microservices] and
232[reproducible builds,][repb] though I personally think [NixOS] is a better fit
233for the latter. App containers are also your only option if you want to use
234cloud platforms with extreme scaling capabilities like Google Cloud's App Engine
235standard environment or AWS's Fargate.
236
237[microservices]: https://en.wikipedia.org/wiki/Microservices
238[repb]: https://en.wikipedia.org/wiki/Reproducible_builds
239[NixOS]: https://nixos.org/
240
241Application containers also tend to be necessary when the application you want
242to self-host is _only_ distributed as a Docker image and the maintainers
243adamantly refuse to support any other deployment method. This is a _massive_ pet
244peeve of mine; yes, Docker can make running self-hosted applications easier for
245inexperienced individuals,[^1] but application orchestration system _does not_
246fit in every single environment. By refusing to provide proper "manual"
247deployment instructions, maintainers of these projects alienate an entire class
248of potential users and it pisses me off.
249
250Just document your shit.
251
252### System containers
253
254Personally, I use system containers for everything else. I prefer the simplicity
255of being able to shell into a system and work with it almost exactly
256
257## Crash course to LXD
258
259### Installation
260
261{{< adm type="note" >}}
262
263**Note:** the instructions below say to install LXD using [Snap.][snap] I
264personally dislike Snap, but LXD is a Canonical product and they're doing their
265best to promote it as much as possible. One of the first things the Incus
266project did was [rip out Snap support,][rsnap] so it will eventually be
267installable as a proper native package.
268
269[snap]: https://en.wikipedia.org/wiki/Snap_(software)
270[rsnap]: https://github.com/lxc/incus/compare/9579f65cd0f215ecd847e8c1cea2ebe96c56be4a...3f64077a80e028bb92b491d42037124e9734d4c7
271
272{{< /adm >}}
273
2741. Install snap following [Canonical's tutorial](https://earl.run/ZvUK)
275   - LXD is natively packaged for Arch and Alpine, but configuration can be a
276     massive headache.
2772. `sudo snap install lxd`
2783. `lxd init`
279   - Defaults are fine for the most part; you may want to increase the size of
280     the storage pool.
2814. `lxc launch images:debian/12 container-name`
2825. `lxc shell container-name`
283
284### Usage
285
286As an example of how to use LXD in a real situation, we'll set up [my URL
287shortener.][earl] You'll need a VPS with LXD installed and a (sub)domain pointed
288to the VPS.
289
290Run `lxc launch images:debian/12 earl` followed by `lxc shell earl` and `apt
291install curl`. Also `apt install` a text editor, like `vim` or `nano` depending
292on what you're comfortable with. Head to the **Installation** section of [earl's
293SourceHut page][earl] and expand the **List of latest binaries**. Copy the link
294to the binary appropriate for your platform, head back to your terminal, type
295`curl -LO`, and paste the link you copied. This will download the binary to your
296system. Run `mv <filename> earl` to rename it, `chmod +x earl` to make it
297executable, then `./earl` to execute it. It will create a file called
298`config.yaml` that you need to edit before proceeding. Change the `accessToken`
299to something else and replace the `listen` value, `127.0.0.1`, with `0.0.0.0`.
300This exposes the application to the host system so we can reverse proxy it.
301
302[earl]: https://earl.run/source
303
304The next step is daemonising it so it runs as soon as the system boots. Edit the
305file located at `/etc/systemd/system/earl.service` and paste the following code
306snippet into it.
307
308```ini
309[Unit]
310Description=personal link shortener
311After=network.target
312
313[Service]
314User=root
315Group=root
316WorkingDirectory=/root/
317ExecStart=/root/earl -c config.yaml
318
319[Install]
320WantedBy=multi-user.target
321```
322
323Save, then run `systemctl daemon-reload` followed by `systemctl enable --now
324earl`. You should be able to `curl localhost:8275` and see some HTML.
325
326Now we need a reverse proxy on the host. Exit the container with `exit` or
327`Ctrl+D`, and if you have a preferred webserver, install it. If you don't have a
328preferred webserver yet, I recommend [installing Caddy.][caddy] All that's left
329is running `lxc list`, making note of the `earl` container's `IPv4` address, and
330reverse proxying it. If you're using Caddy, edit `/etc/caddy/Caddyfile` and
331replace everything that's there with the following.
332
333[caddy]: https://caddyserver.com/docs/install
334
335```text
336<(sub)domain> {
337	encode zstd gzip
338	reverse_proxy <container IP address>:1313
339}
340```
341
342Run `systemctl restart caddy` and head to whatever domain or subdomain you
343entered. You should see the home page with just the text `earl` on it. If you go
344to `/login`, you'll be able to enter whatever access token you set earlier and
345log in.
346
347### Executing a fork bomb
348
349I've seen some people say that executing a fork bomb from inside a container is
350equivalent to executing it on the host. The fork bomb will blow up the whole
351system and render every application and container you're running inoperable.
352
353That's partially true because LXD _by default_ doesn't put a limit on how many
354processes a particular container can spawn. You can limit that number yourself
355by running
356
357```text
358lxc profile set default limits.processes <num-processes>
359```
360
361Any container you create under the `default` profile will have a total process
362limit of `<num-processes>`. I can't tell you what a good process limit is
363though; you'll need to do some testing and experimentation on your own.
364
365Note that this doesn't _save_ you from fork bombs, all it does is prevent an
366affected container from affecting _other_ containers. If someone executes a fork
367bomb in a container, it'll be the same as if they executed it in a virtual
368machine; assuming it's a one-off, you'll need to fix it by rebooting the
369container. If it was set to run at startup, you'll need to recreate the
370container, restore from a backup, revert to a snapshot, etc.
371
372[^1]:
373    Until they need to do _anything_ more complex than pull a newer image. Then
374    it's twice as painful as the "manual" method might have been.