Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
| Comment: | Mentioned containerd+nerdctl in place of runc in the containers doc. A tightened-up version of the prior runc and crun sections are now collected below the Podman section. This gives a better flow: each successive option is smaller than the last, excepting only nspawn, which is a bit bigger than crun. (We leave nspawn last because we can't get it to work!) |
|---|---|
| Downloads: | Tarball | ZIP archive |
| Timelines: | family | ancestors | descendants | both | trunk |
| Files: | files | file ages | folders |
| SHA3-256: |
457c14a490c76ef168505d9e17456fe4 |
| User & Date: | wyoung 2022-09-07 09:11:31.093 |
Context
|
2022-09-09
| ||
| 17:22 | Sync up custom makefile for MinGW. ... (check-in: a3ed29ea34 user: mistachkin tags: trunk) | |
|
2022-09-07
| ||
| 09:11 | Mentioned containerd+nerdctl in place of runc in the containers doc. A tightened-up version of the prior runc and crun sections are now collected below the Podman section. This gives a better flow: each successive option is smaller than the last, excepting only nspawn, which is a bit bigger than crun. (We leave nspawn last because we can't get it to work!) ... (check-in: 457c14a490 user: wyoung tags: trunk) | |
| 07:35 | Updated the "nojail" patch for our Dockerfile to track the recent changes: rename back from Dockerfile.in and the layer refactoring. It does essentially the same thing as before. ... (check-in: 19abf0ac13 user: wyoung tags: trunk) | |
Changes
Changes to www/containers.md.
| ︙ | ︙ | |||
528 529 530 531 532 533 534 | this idea to the rest of your site.) [DD]: https://www.docker.com/products/docker-desktop/ [DE]: https://docs.docker.com/engine/ [DNT]: ./server/debian/nginx.md | | | < < < < | < < | < < < < < < < | < < < < < < < < < < < < < < < < | < | < < < < < < < < < < < < < | < < < < | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | < < < < | < < < < | < < < | < < < < < < < < < < < < | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | | < < < | < < < < < | < < < < < < < < < < | < < | | < < > | 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 |
this idea to the rest of your site.)
[DD]: https://www.docker.com/products/docker-desktop/
[DE]: https://docs.docker.com/engine/
[DNT]: ./server/debian/nginx.md
### 6.1 <a id="nerdctl" name="containerd"></a>Stripping Docker Engine Down
The core of Docker Engine is its [`containerd`][ctrd] daemon and the
[`runc`][runc] container runner. Add to this the out-of-core CLI program
[`nerdctl`][nerdctl] and you have enough of the engine to run Fossil
containers. The big things you’re missing are:
* **BuildKit**: The container build engine, which doesn’t matter if
you’re building elsewhere and using a container registry as an
intermediary between that build host and the deployment host.
* **SwarmKit**: A powerful yet simple orchestrator for Docker that you
probably aren’t using with Fossil anyway.
In exchange, you get a runtime that’s about half the size of Docker
Engine. The commands are essentially the same as above, but you say
“`nerdctl`” instead of “`docker`”. You might alias one to the other,
because you’re still going to be using Docker to build and ship your
container images.
[ctrd]: https://containerd.io/
[nerdctl]: https://github.com/containerd/nerdctl
[runc]: https://github.com/opencontainers/runc
### 6.2 <a id="podman"></a>Podman
A lighter-weight alternative to either of the prior options that doesn’t
give up the image builder is [Podman]. Initially created by
Red Hat and thus popular on that family of OSes, it will run on
any flavor of Linux. It can even be made to run [on macOS via Homebrew][pmmac]
or [on Windows via WSL2][pmwin].
On Ubuntu 22.04, it’s about a quarter the size of Docker Engine, or half
that of the “full” distribution of `nerdctl` and all its dependencies.
Although Podman [bills itself][whatis] as a drop-in replacement for the
`docker` command and everything that sits behind it, some of the tool’s
design decisions affect how our Fossil containers run, as compared to
using Docker. The most important of these is that, by default, Podman
wants to run your container “rootless,” meaning that it runs as a
regular user. This is generally better for security, but [we dealt with
|
| ︙ | ︙ | |||
815 816 817 818 819 820 821 | around in. That shouldn’t be enough to let them break out of the container entirely, but they’ll have powerful tools like `wget`, and they’ll be connected to the network the container runs on. Once the bad guy is inside the house, he doesn’t necessarily have to go after the residents directly to cause problems for them. | < < < < < < < < < < < < < < < < < < < < | < < < < < | 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 |
around in. That shouldn’t be enough to let them break out of the
container entirely, but they’ll have powerful tools like `wget`, and
they’ll be connected to the network the container runs on. Once the bad
guy is inside the house, he doesn’t necessarily have to go after the
residents directly to cause problems for them.
#### 6.2.2 <a id="podman-rootful"></a>Fossil in a Rootful Podman Container
##### Simple Method
Fortunately, it’s easy enough to have it both ways. Simply run your
`podman` commands as root:
```
$ sudo podman build -t fossil --cap-add MKNOD .
$ sudo podman create \
--name fossil \
|
| ︙ | ︙ | |||
873 874 875 876 877 878 879 | We have to do the build under `sudo` in part because we’re doing rootly things with the file system image layers we’re building up. Just because it’s done inside a container runtime’s build environment doesn’t mean we can get away without root privileges to do things like create the `/jail/dev/null` node. The other reason we need “`sudo podman build`” is because it puts the result | | | 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 | We have to do the build under `sudo` in part because we’re doing rootly things with the file system image layers we’re building up. Just because it’s done inside a container runtime’s build environment doesn’t mean we can get away without root privileges to do things like create the `/jail/dev/null` node. The other reason we need “`sudo podman build`” is because it puts the result into root’s Podman image registry, where the next steps look for it. That in turn explains why we need “`sudo podman create`:” because it’s creating a container based on an image that was created by root. If you ran that step without `sudo`, it wouldn’t be able to find the image. If Docker is looking better and better to you as a result of all this, realize that it’s doing the same thing. It just hides it better by |
| ︙ | ︙ | |||
925 926 927 928 929 930 931 |
```
$ sudo podman create \
--any-options-you-like \
docker.io/mydockername/fossil
```
| | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | | | 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 |
```
$ sudo podman create \
--any-options-you-like \
docker.io/mydockername/fossil
```
This round-trip through the public image registry has another side
benefit: your local system might be a lot faster than your remote one,
as when the remote is a small VPS. Even with the overhead of schlepping
container images across the Internet, it can be a net win in terms of
build time.
### 6.3 <a id="barebones"></a>Bare-Bones OCI Bundle Runners
If even the Podman stack is too big for you, you still have options for
running containers that are considerably slimmer, at a high cost to
administration complexity and loss of features.
Part of the OCI standard is the notion of a “bundle,” being a consistent
way to present a pre-built and configured container to the runtime.
Essentially, it consists of a directory containing a `config.json` file
and a `rootfs/` subdirectory containing the root filesystem image. Many
tools can produce these for you. We’ll show only one method in the first
section below, then reuse that in the following sections.
#### 6.3.1 <a id="runc"></a>`runc`
We mentioned `runc` [above](#nerdctl), but it’s possible to use it
standalone, without `containerd` or its CLI frontend `nerdctl`. You also
lose the build engine, intelligent image layer sharing, image registry
connections, and much more. The plus side is that `runc` alone is
18 MiB.
Using it without all the support tooling isn’t complicated, but it *is*
cryptic enough to want a shell script. Let’s say we want to build on our
big desktop machine but ship the resulting container to a small remote
host. This should serve:
----
```shell
#!/bin/bash -ex
c=fossil
b=/var/lib/machines/$c
h=my-host.example.com
m=/run/containerd/io.containerd.runtime.v2.task/moby
t=$(mktemp -d /tmp/$c-bundle.XXXXXX)
if [ -d "$t" ]
then
docker container start $c
docker container export $c > $t/rootfs.tar
id=$(docker inspect --format="{{.Id}}" $c)
sudo cat $m/$id/config.json \
| jq '.root.path = "'$b/rootfs'"'
| jq '.linux.cgroupsPath = ""'
| jq 'del(.linux.sysctl)'
| jq 'del(.linux.namespaces[] | select(.type == "network"))'
| jq 'del(.mounts[] | select(.destination == "/etc/hostname"))'
| jq 'del(.mounts[] | select(.destination == "/etc/resolv.conf"))'
| jq 'del(.mounts[] | select(.destination == "/etc/hosts"))'
| jq 'del(.hooks)' > $t/config.json
scp -r $t $h:tmp
ssh -t $h "{
mv ./$t/config.json $b &&
sudo tar -C $b/rootfs -xf ./$t/rootfs.tar &&
rm -r ./$t
}"
rm -r $t
fi
```
----
The first several lines list configurables:
* **`c`**: the name of the Docker container you’re bundling up for use
with `runc`
* **`b`**: the path of the exported container, called the “bundle” in
OCI jargon; we’re using the [`nspawn`](#nspawn) convention, a
reasonable choice under the [Linux FHS rules][LFHS]
* **`h`**: the remote host name
* **`m`**: the local directory holding the running machines, configurable
because:
* the path name is longer than we want to use inline
* it’s been known to change from one version of Docker to the next
* you might be building and testing with [Podman](#podman), so it
has to be “`/run/user/$UID/crun`” instead
* **`t`**: the temporary bundle directory we populate locally, then
`scp` to the remote machine, where it’s unpacked
[LFHS]: https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard
##### Why All That `sudo` Stuff?
This script uses `sudo` for two different purposes:
1. To read the local `config.json` file out of the `containerd` managed
directory, which is owned by `root` on Docker systems. Additionally,
that input file is only available while the container is started, so
we must ensure that before extracting it.
2. To unpack the bundle onto the remote machine. If you try to get
clever and unpack it locally, then `rsync` it to the remote host to
avoid re-copying files that haven’t changed since the last update,
you’ll find that it fails when it tries to copy device nodes, to
create files owned only by the remote root user, and so forth. If the
container bundle is small, it’s simpler to re-copy and unpack it
fresh each time.
I point all this out because it might ask for your password twice: once for
the local sudo command, and once for the remote.
##### Why All That `jq` Stuff?
We’re using [jq] for two separate purposes:
1. To automatically transmogrify Docker’s container configuration so it
will work with `runc`:
* point it where we unpacked the container’s exported rootfs
* accede to its wish to [manage cgroups by itself][ecg]
* remove the `sysctl` calls that will break after…
* …we remove the network namespace to allow Fossil’s TCP listening
port to be available on the host; `runc` doesn’t offer the
equivalent of `docker create --publish`, and we can’t be
bothered to set up a manual mapping from the host port into the
container
* remove file bindings that point into the local runtime managed
directories; one of the things we give up by using a bare
container runner is automatic management of these files
* remove the hooks for essentially the same reason
2. To make the Docker-managed machine-readable `config.json` more
human-readable, in case there are other things you want changed in
this version of the container. Exposing the `config.json` file like
this means you don’t have to rebuild the container merely to change
a value like a mount point, the kernel capability set, and so forth.
##### Running the Bundle
With the container exported to a bundle like this, you can start it as:
```
$ cd /path/to/bundle
$ c=fossil-runc ← …or anything else you prefer
$ sudo runc create $c
$ sudo runc start $c
$ sudo runc exec $c -t sh -l
~ $ ls museum
repo.fossil
~ $ ps -eaf
PID USER TIME COMMAND
1 fossil 0:00 bin/fossil server --create …
~ $ exit
$ sudo runc kill $c
$ sudo runc delete $c
```
If you’re doing this on the export host, the first command is “`cd $b`”
if we’re using the variables from the shell script above. Alternately,
the `runc` subcommands that need to read the bundle files take a
`--bundle/-b` flag to let you avoid switching directories.
The rest should be straightforward: create and start the container as
root so the `chroot(2)` call inside the container will succeed, then get
into it with a login shell and poke around to prove to ourselves that
everything is working properly. It is. Yay!
The remaining commands show shutting the container down and destroying
it, simply to show how these commands change relative to using the
Docker Engine commands. It’s “kill,” not “stop,” and it’s “delete,” not
“rm.”
[ecg]: https://github.com/opencontainers/runc/pull/3131
[jq]: https://stedolan.github.io/jq/
##### Lack of Layer Sharing
The bundle export process collapses Docker’s union filesystem down to a
single layer. Atop that, it makes all files mutable.
All of this is fine for tiny remote hosts with a single container, or at
least one where none of the containers share base layers. Where it
becomes a problem is when you have multiple Fossil containers on a
single host, since they all derive from the same base image.
The full-featured container runtimes above will intelligently share
these immutable base layers among the containers, storing only the
differences in each individual container. More, when pulling images from
a registry host, they’ll transfer only the layers you don’t have copies
of locally, so you don’t have to burn bandwidth sending copies of Alpine
and BusyBox each time, even though they’re unlikely to change from one
build to the next.
#### 6.3.2 <a id="crun"></a>`crun`
In the same way that [Docker Engine is based on `runc`](#runc), Podman’s
engine is based on [`crun`][crun], a lighter-weight alternative to
`runc`. It’s only 1.4 MiB on the system I tested it on, yet it will run
the same container bundles as in my `runc` examples above. We saved
more than that by compressing the container’s Fossil executable with
UPX, making the runtime virtually free in this case. The only question
is whether you can put up with its limitations, which are the same as
for `runc`.
[crun]: https://github.com/containers/crun
#### 6.3.3 <a id="nspawn"></a>`systemd-nspawn`
As of `systemd` version 242, its optional `nspawn` piece
[reportedly](https://www.phoronix.com/news/Systemd-Nspawn-OCI-Runtime)
got the ability to run OCI bundles directly. You might
have it installed already, but if not, it’s only about 2 MiB. It’s
in the `systemd-containers` package as of Ubuntu 22.04 LTS:
```
$ sudo apt install systemd-containers
```
|
| ︙ | ︙ | |||
961 962 963 964 965 966 967 |
--machine=fossil \
--network-veth \
--port=127.0.0.1:127.0.0.1:9999:8080
$ sudo machinectl list
No machines.
```
| | | | 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 |
--machine=fossil \
--network-veth \
--port=127.0.0.1:127.0.0.1:9999:8080
$ sudo machinectl list
No machines.
```
This is why I wrote “reportedly” above: I couldn’t get it to work on two different
Linux distributions, and I can’t see why. I’m leaving this here to give
someone else a leg up, with the hope that they will work out what’s
needed to get the container running and registered with `machinectl`.
As of this writing, the tool expects an OCI container version of
“1.0.0”. I had to edit this at the top of my `config.json` file to get
the first command to read the bundle. The fact that it errored out when
I had “`1.0.2-dev`” in there proves it’s reading the file, but it
doesn’t seem able to make sense of what it finds there, and it doesn’t
give any diagnostics to say why.
<div style="height:50em" id="this-space-intentionally-left-blank"></div>
|