lxd-containers-for-human-beings.md

  1---
  2title: "LXD: Containers for Human Beings"
  3subtitle: "Docker's great and all, but I prefer the workflow of interacting with VMs"
  4date: 2023-09-17T18:04:00-04:00
  5categories:
  6  - Technology
  7tags:
  8  - Sysadmin
  9  - Containers
 10  - VMs
 11  - Docker
 12  - LXD
 13draft: true
 14toc: true
 15rss_only: false
 16cover: ./cover.png
 17---
 18
 19This is a blog post version of a talk I presented at both Ubuntu Summit 2022 and
 20SouthEast LinuxFest 2023. The first was not recorded, but the second was and is
 21on [SELF's PeerTube instance.][selfpeertube] I apologise for the terrible audio,
 22but there's unfortunately nothing I can do about that. If you're already
 23intimately familiar with the core concepts of VMs or containers, I would suggest
 24skipping those respective sections. If you're vaguely familiar with either, I
 25would recommend reading them because I do go a little bit in-depth.
 26
 27[selfpeertube]: https://peertube.linuxrocks.online/w/hjiTPHVwGz4hy9n3cUL1mq?start=1m
 28
 29{{< adm type="warn" >}}
 30
 31**Note:** Canonical has decided to [pull LXD out][lxd] from under the Linux
 32Containers entity and instead continue development under the Canonical brand.
 33The majority of the LXD creators and developers have congregated around a fork
 34called [Incus.][inc] I'll be keeping a close eye on the project and intend to
 35migrate as soon as there's an installable release.
 36
 37[lxd]: https://linuxcontainers.org/lxd/
 38[inc]: https://linuxcontainers.org/incus/
 39
 40{{< /adm >}}
 41
 42Questions, comments, and corrections are welcome! Feel free to use the
 43self-hosted comment system at the bottom, send me an email, an IM, reply to the
 44fediverse post, etc. Edits and corrections, if there are any, will be noted just
 45below this paragraph.
 46
 47## The benefits of VMs and containers
 48
 49- **Isolation:** you don't want to allow an attacker to infiltrate your email
 50  server through your web application; the two should be completely separate
 51  from each other and VMs/containers provide strong isolation guarantees.
 52- **Flexibility:** <abbr title="Virtual Machines">VMs</abbr> and containers only
 53  use the resources they've been given. If you tell the VM it has 200 MBs of
 54  RAM, it's going to make do with 200 MBs of RAM and the kernel's <abbr
 55  title="Out Of Memory">OOM</abbr> killer is going to have a fun time 🤠
 56- **Portability:** once set up and configured, VMs and containers can mostly be
 57  treated as closed boxes; as long as the surrounding environment of the new
 58  host is similar to the previous in terms of communication (proxies, web
 59  servers, etc.), they can just be picked up and dropped between various hosts
 60  as necessary.
 61- **Density:** applications are usually much lighter than the systems they're
 62  running on, so it makes sense to run many applications on one system. VMs and
 63  containers facilitate that without sacrificing security.
 64- **Cleanliness:** VMs and containers are applications in black boxes. When
 65  you're done with the box, you can just throw it away and most everything
 66  related to the application is gone.
 67
 68## Virtual machines
 69
 70As the name suggests, Virtual Machines are all virtual; a hypervisor creates
 71virtual disks for storage, virtual <abbr title="Central Processing
 72Units">CPUs</abbr>, virtual <abbr title="Network Interface Cards">NICs</abbr>,
 73virtual <abbr title="Random Access Memory">RAM</abbr>, etc. On top of the
 74virtualised hardware, you have your kernel. This is what facilitates
 75communication between the operating system and the (virtual) hardware. Above
 76that is the operating system and all your applications.
 77
 78At this point, the stack is quite large; VMs aren't exactly lightweight, and
 79this impacts how densely you can pack the host.
 80
 81I mentioned a "hypervisor" a minute ago. I've explained what hypervisors in
 82general do, but there are actually two different kinds of hypervisor. They're
 83creatively named **Type 1** and **Type 2**.
 84
 85### Type 1 hypervisors
 86
 87These run directly in the host kernel without an intermediary OS. A good example
 88would be [KVM,][kvm] a **VM** hypervisor than runs in the **K**ernel. Type 1
 89hypervisors can communicate directly with the host's hardware to allocate RAM,
 90issue instructions to the CPU, etc.
 91
 92[debian]: https://debian.org
 93[kvm]: https://www.linux-kvm.org
 94[vb]: https://www.virtualbox.org/
 95
 96```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
 97hk: Host kernel
 98hk.h: Type 1 hypervisor
 99hk.h.k1: Guest kernel
100hk.h.k2: Guest kernel
101hk.h.k3: Guest kernel
102hk.h.k1.os1: Guest OS
103hk.h.k2.os2: Guest OS
104hk.h.k3.os3: Guest OS
105hk.h.k1.os1.app1: Many apps
106hk.h.k2.os2.app2: Many apps
107hk.h.k3.os3.app3: Many apps
108```
109
110### Type 2 hypervisors
111
112These run in userspace as an application, like [VirtualBox.][vb] Type 2
113hypervisors have to first go through the operating system, adding an additional
114layer to the stack.
115
116```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
117hk: Host kernel
118hk.os: Host OS
119hk.os.h: Type 2 hypervisor
120hk.os.h.k1: Guest kernel
121hk.os.h.k2: Guest kernel
122hk.os.h.k3: Guest kernel
123hk.os.h.k1.os1: Guest OS
124hk.os.h.k2.os2: Guest OS
125hk.os.h.k3.os3: Guest OS
126hk.os.h.k1.os1.app1: Many apps
127hk.os.h.k2.os2.app2: Many apps
128hk.os.h.k3.os3.app3: Many apps
129```
130
131## Containers
132
133VMs use virtualisation to achieve isolation. Containers use **namespaces** and
134**cgroups**, technologies pioneered in the Linux kernel. By now, though, there
135are [equivalents for Windows] and possibly other platforms.
136
137[equivalents for Windows]: https://learn.microsoft.com/en-us/virtualization/community/team-blog/2017/20170127-introducing-the-host-compute-service-hcs
138
139**[Linux namespaces]** partition kernel resources like process IDs, hostnames,
140user IDs, directory hierarchies, network access, etc. This prevents one
141collection of processes from seeing or gaining access to data regarding another
142collection of processes.
143
144**[Cgroups]** limit, track, and isolate the hardware resource use of a
145collection of processes. If you tell a cgroup that it's only allowed to spawn
146500 child processes and someone executes a fork bomb, the fork bomb will expand
147until it hits that limit. The kernel will prevent it from spawning further
148children and you'll have to resolve the issue the same way you would with VMs:
149delete and re-create it, restore from a good backup, etc. You can also limit CPU
150use, the number of CPU cores it can access, RAM, disk use, and so on.
151
152[Linux namespaces]: https://en.wikipedia.org/wiki/Linux_namespaces
153[Cgroups]: https://en.wikipedia.org/wiki/Cgroups
154
155### Application containers
156
157The most well-known example of application container tech is probably
158[Docker.][docker] The goal here is to run a single application as minimally as
159possible inside each container. In the case of a single, statically-linked Go
160binary, a minimal Docker container might contain nothing more than the binary.
161If it's a Python application, you're more likely to use an [Alpine Linux image]
162and add your Python dependencies on top of that. If a database is required, that
163goes in a separate container. If you've got a web server to handle TLS
164termination and proxy your application, that's a third container. One cohesive
165system might require many Docker containers to function as intended.
166
167[docker]: https://docker.com/
168[Alpine Linux image]: https://hub.docker.com/_/alpine
169
170```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
171Host kernel.Container runtime.c1: Container
172Host kernel.Container runtime.c2: Container
173Host kernel.Container runtime.c3: Container
174
175Host kernel.Container runtime.c1.One app
176Host kernel.Container runtime.c2.Few apps
177Host kernel.Container runtime.c3.Full OS.Many apps
178```
179
180### System containers
181
182One of the most well-known examples of system container tech is the subject of
183this post: LXD! Rather than containing a single application or a very small set
184of them, system containers are designed to house entire operating systems, like
185[Debian] or [Rocky Linux,][rocky] along with everything required for your
186application. Using our examples from above, a single statically-linked Go binary
187might run in a full Debian container, just like the Python application might.
188The database and webserver might go in _that same_ container.
189
190[Debian]: https://www.debian.org/
191[rocky]: https://rockylinux.org/
192
193You treat each container more like you would a VM, but you get the performance
194benefit of _not_ virtualising everything. Containers tend to be _much_ lighter
195than most VMs.[^1]
196
197```kroki {type=d2,d2theme=flagship-terrastruct,d2sketch=true}
198hk: Host kernel
199hk.c1: Container
200hk.c2: Container
201hk.c3: Container
202hk.c1.os1: Full OS
203hk.c2.os2: Full OS
204hk.c3.os3: Full OS
205hk.c1.os1.app1: Many apps
206hk.c2.os2.app2: Many apps
207hk.c3.os3.app3: Many apps
208```
209
210## When to use which
211
212These are personal opinions. Please evaluate each technology and determine for
213yourself whether it's a suitable fit for your environment.
214
215### VMs
216
217As far as I'm aware, VMs are your only option when you want to work with
218esoteric hardware or hardware you don't physically have on-hand. You can tell
219your VM that it's running with RAM that's 20 years old, a still-in-development
220RISC-V CPU, and a 420p monitor. That's not possible with containers. VMs are
221also your only option when you want to work with foreign operating systems:
222running Linux on Windows, Windows on Linux, or OpenBSD on a Mac all require
223virtualisation. Another reason to stick with VMs is for compliance purposes.
224Containers are still very new and some regulatory bodies require virtualisation
225because it's a decades-old and battle-tested isolation technique.
226
227{{< adm type="note" >}}
228See Drew DeVault's blog post [_In praise of qemu_][qemu] for a great use of VMs
229
230[qemu]: https://drewdevault.com/2022/09/02/2022-09-02-In-praise-of-qemu.html
231
232{{< /adm >}}
233
234### Application containers
235
236Application containers are particularly popular for [microservices] and
237[reproducible builds,][repb] though I personally think [NixOS] is a better fit
238for the latter. App containers are also your only option if you want to use
239cloud platforms with extreme scaling capabilities like Google Cloud's App Engine
240standard environment or AWS's Fargate.
241
242[microservices]: https://en.wikipedia.org/wiki/Microservices
243[repb]: https://en.wikipedia.org/wiki/Reproducible_builds
244[NixOS]: https://nixos.org/
245
246Application containers also tend to be necessary when the application you want
247to self-host is _only_ distributed as a Docker image and the maintainers
248adamantly refuse to support any other deployment method. This is a _massive_ pet
249peeve of mine; yes, Docker can make running self-hosted applications easier for
250inexperienced individuals,[^2] but an application orchestration system _does
251not_ fit in every single environment. By refusing to provide proper "manual"
252deployment instructions, maintainers of these projects alienate an entire class
253of potential users and it pisses me off.
254
255Just document your shit.
256
257### System containers
258
259Personally, I prefer the workflow of system containers and use them for
260everything else. Because they contain entire operating systems, you're able to
261interact with it in a similar way to VMs or even your PC; you shell into it,
262`apt install` whatever you need, set up the application, expose it over the
263network (for example, on `0.0.0.0:8080`), proxy it on the container host, and
264that's it! This process can be trivially automated with shell scripts, Ansible
265roles, Chef, Puppet, whatever you like. Back the system up using [tarsnap] or
266[rsync.net] or [Backblaze,][bb] Google Drive, and [restic.][restic] If you use
267ZFS for your LXD storage pool, maybe go with [syncoid and sanoid.][ss]
268
269[tarsnap]: https://www.tarsnap.com/
270[rsync.net]: https://rsync.net/
271[bb]: https://www.backblaze.com/
272[restic]: https://restic.net/
273[ss]: https://github.com/jimsalterjrs/sanoid
274
275My point is that using system containers doesn't mean throwing out the last few
276decades of systems knowledge and wisdom.
277
278## Crash course to LXD
279
280Quick instructions for installing LXD and setting up your first application.
281
282### Installation
283
284{{< adm type="note" >}}
285
286**Note:** the instructions below say to install LXD using [Snap.][snap] I
287personally dislike Snap, but LXD is a Canonical product and they're doing their
288best to promote it as much as possible. One of the first things the Incus
289project did was [rip out Snap support,][rsnap] so it will eventually be
290installable as a proper native package.
291
292[snap]: https://en.wikipedia.org/wiki/Snap_(software)
293[rsnap]: https://github.com/lxc/incus/compare/9579f65cd0f215ecd847e8c1cea2ebe96c56be4a...3f64077a80e028bb92b491d42037124e9734d4c7
294
295{{< /adm >}}
296
2971. Install snap following [Canonical's tutorial](https://earl.run/ZvUK)
298   - LXD is natively packaged for Arch and Alpine, but configuration can be a
299     massive headache.
3002. `sudo snap install lxd`
3013. `lxd init`
302   - Defaults are fine for the most part; you may want to increase the size of
303     the storage pool.
3044. `lxc launch images:debian/12 container-name`
3055. `lxc shell container-name`
306
307### Usage
308
309As an example of how to use LXD in a real situation, we'll set up [my URL
310shortener.][earl] You'll need a VPS with LXD installed and a (sub)domain pointed
311to the VPS.
312
313Run `lxc launch images:debian/12 earl` followed by `lxc shell earl` and `apt
314install curl`. Also `apt install` a text editor, like `vim` or `nano` depending
315on what you're comfortable with. Head to the **Installation** section of [earl's
316SourceHut page][earl] and expand the **List of latest binaries**. Copy the link
317to the binary appropriate for your platform, head back to your terminal, type
318`curl -LO`, and paste the link you copied. This will download the binary to your
319system. Run `mv <filename> earl` to rename it, `chmod +x earl` to make it
320executable, then `./earl` to execute it. It will create a file called
321`config.yaml` that you need to edit before proceeding. Change the `accessToken`
322to something else and replace the `listen` value, `127.0.0.1`, with `0.0.0.0`.
323This exposes the application to the host system so we can reverse proxy it.
324
325[earl]: https://earl.run/source
326
327The next step is daemonising it so it runs as soon as the system boots. Edit the
328file located at `/etc/systemd/system/earl.service` and paste the following code
329snippet into it.
330
331```ini
332[Unit]
333Description=personal link shortener
334After=network.target
335
336[Service]
337User=root
338Group=root
339WorkingDirectory=/root/
340ExecStart=/root/earl -c config.yaml
341
342[Install]
343WantedBy=multi-user.target
344```
345
346Save, then run `systemctl daemon-reload` followed by `systemctl enable --now
347earl`. You should be able to `curl localhost:8275` and see some HTML.
348
349Now we need a reverse proxy on the host. Exit the container with `exit` or
350`Ctrl+D`, and if you have a preferred webserver, install it. If you don't have a
351preferred webserver yet, I recommend [installing Caddy.][caddy] All that's left
352is running `lxc list`, making note of the `earl` container's `IPv4` address, and
353reverse proxying it. If you're using Caddy, edit `/etc/caddy/Caddyfile` and
354replace everything that's there with the following.
355
356[caddy]: https://caddyserver.com/docs/install
357
358```text
359<(sub)domain> {
360	encode zstd gzip
361	reverse_proxy <container IP address>:1313
362}
363```
364
365Run `systemctl restart caddy` and head to whatever domain or subdomain you
366entered. You should see the home page with just the text `earl` on it. If you go
367to `/login`, you'll be able to enter whatever access token you set earlier and
368log in.
369
370### Further tips
371
372One of the things you might want to do post-installation is mess around with
373profiles. There's a `default` profile in LXD that you can show with `lxc profile
374show default`.
375
376``` text
377$ lxc profile show default
378config: {}
379description: Default LXD profile
380devices:
381  eth0:
382    name: eth0
383    network: lxdbr0
384    type: nic
385  root:
386    path: /
387    pool: default
388    type: disk
389name: default
390used_by: []
391```
392
393Not all config options are listed here though; you'll need to read [the
394documentation] for a full enumeration.
395
396[the documentation]: https://documentation.ubuntu.com/lxd/en/latest/config-options/
397
398I've seen some people say that executing a fork bomb from inside a container is
399equivalent to executing it on the host. The fork bomb will blow up the whole
400system and render every application and container you're running inoperable.
401That's partially true because LXD _by default_ doesn't put a limit on how many
402processes a particular container can spawn. You can limit that number yourself
403by running
404
405```text
406lxc profile set default limits.processes <num-processes>
407```
408
409Any container you create under the `default` profile will have a total process
410limit of `<num-processes>`. I can't tell you what a good process limit is
411though; you'll need to do some testing and experimentation on your own.
412
413As stated in [the containers section,](#containers) this doesn't _save_ you from
414fork bombs. It just helps prevent a fork bomb from affecting the host OS or
415other containers.
416
417[^1]:
418    There's a [technical
419    publication](https://dl.acm.org/doi/10.1145/3132747.3132763) indicating that
420    specialised VMs with unikernels can be far lighter and more secure than
421    containers.
422
423[^2]:
424    Until they need to do _anything_ more complex than pull a newer image. Then
425    it's twice as painful as the "manual" method might have been.