Release 1.1.0

This commit is contained in:
Jip-Hop 2024-03-02 18:41:49 +01:00 committed by GitHub
commit f4f7311a12
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
12 changed files with 1990 additions and 532 deletions

View File

@ -10,16 +10,19 @@ Persistent Linux 'jails' on TrueNAS SCALE to install software (docker-compose, p
TrueNAS SCALE can create persistent Linux 'jails' with systemd-nspawn. This script helps with the following: TrueNAS SCALE can create persistent Linux 'jails' with systemd-nspawn. This script helps with the following:
- Installing the systemd-container package (which includes systemd-nspawn)
- Setting up the jail so it won't be lost when you update SCALE - Setting up the jail so it won't be lost when you update SCALE
- Choosing a distro (Debian 12 strongly recommended, but Ubuntu, Arch Linux or Rocky Linux seem good choices too) - Choosing a distro (Debian 12 strongly recommended, but Ubuntu, Arch Linux or Rocky Linux seem good choices too)
- Optional: configuring the jail so you can run Docker inside it - Optional: configuring the jail so you can run Docker inside it
- Optional: GPU passthrough (including [nvidia GPU](README.md#nvidia-gpu) with the drivers bind mounted from the host) - Optional: GPU passthrough (including [nvidia GPU](README.md#nvidia-gpu) with the drivers bind mounted from the host)
- Starting the jail with your config applied - Starting the jail with your config applied
## Security
Despite what the word 'jail' implies, jailmaker's intended use case is to create one or more additional filesystems to run alongside SCALE with minimal isolation. By default the root user in the jail with uid 0 is mapped to the host's uid 0. This has [obvious security implications](https://linuxcontainers.org/lxc/security/#privileged-containers). If this is not acceptable to you, you may lock down the jails by [limiting capabilities](https://manpages.debian.org/bookworm/systemd-container/systemd-nspawn.1.en.html#Security_Options) and/or using [user namespacing](https://manpages.debian.org/bookworm/systemd-container/systemd-nspawn.1.en.html#User_Namespacing_Options) or use a VM instead.
## Installation ## Installation
Create a new dataset called `jailmaker` with the default settings (from TrueNAS web interface). Then login as the root user and download `jlmkr.py`. [Installation steps with screenshots](https://www.truenas.com/docs/scale/scaletutorials/apps/sandboxes/) are provided on the TrueNAS website. Start by creating a new dataset called `jailmaker` with the default settings (from TrueNAS web interface). Then login as the root user and download `jlmkr.py`.
```shell ```shell
cd /mnt/mypool/jailmaker cd /mnt/mypool/jailmaker
@ -28,34 +31,49 @@ chmod +x jlmkr.py
./jlmkr.py install ./jlmkr.py install
``` ```
The `jlmkr.py` script (and the jails + config it creates) are now stored on the `jailmaker` dataset and will survive updates of TrueNAS SCALE. Additionally a symlink has been created so you can call `jlmkr` from anywhere. The `jlmkr.py` script (and the jails + config it creates) are now stored on the `jailmaker` dataset and will survive updates of TrueNAS SCALE. A symlink has been created so you can call `jlmkr` from anywhere (unless the boot pool is readonly, which is the default since SCALE 24.04). Additionally shell aliases have been setup, so you can still call `jlmkr` in an interactive shell (even if the symlink couldn't be created).
After an update of TrueNAS SCALE the symlink will be lost and `systemd-nspawn` (the core package which makes `jailmaker` work) may be gone too. Not to worry, just run `./jlmkr.py install` again or use [the `./jlmkr.py startup` command](#startup-jails-on-boot). After an update of TrueNAS SCALE the symlink will be lost (but the shell aliases will remain). To restore the symlink, just run `./jlmkr.py install` again or use [the `./jlmkr.py startup` command](#startup-jails-on-boot).
## Usage ## Usage
### Create Jail ### Create Jail
Creating a jail is interactive. You'll be presented with questions which guide you through the process. Creating jail with the default settings is as simple as:
```shell ```shell
jlmkr create myjail jlmkr create myjail
``` ```
After answering a few questions you should have your first jail up and running! You may also specify a path to a config template, for a quick and consistent jail creation process.
```shell
jlmkr create --config /path/to/config/template myjail
```
Or you can override the default config by using flags. See `jlmkr create --help` for the available options. Anything passed after the jail name will be passed to `systemd-nspawn` when starting the jail. See the `systemd-nspawn` manual for available options, specifically [Mount Options](https://manpages.debian.org/bookworm/systemd-container/systemd-nspawn.1.en.html#Mount_Options) and [Networking Options](https://manpages.debian.org/bookworm/systemd-container/systemd-nspawn.1.en.html#Networking_Options) are frequently used.
```shell
jlmkr create --distro=ubuntu --release=jammy myjail --bind-ro=/mnt
```
If you omit the jail name, the create process is interactive. You'll be presented with questions which guide you through the process.
```shell
jlmkr create
```
After answering some questions you should have your first jail up and running!
### Startup Jails on Boot ### Startup Jails on Boot
```shell ```shell
# Best to call startup directly (not through the jlmkr symlink) # Call startup using the absolute path to jlmkr.py
# The jlmkr shell alias doesn't work in Init/Shutdown Scripts
/mnt/mypool/jailmaker/jlmkr.py startup /mnt/mypool/jailmaker/jlmkr.py startup
# Can be called from the symlink too...
# But this may not be available after a TrueNAS SCALE update
jlmkr startup
``` ```
In order to start jails automatically after TrueNAS boots, run `/mnt/mypool/jailmaker/jlmkr.py startup` as Post Init Script with Type `Command` from the TrueNAS web interface. This will automatically fix the installation of `systemd-nspawn` and setup the `jlmkr` symlink, as well as start all the jails with `startup=1` in the config file. Running the `startup` command Post Init is recommended to keep `jailmaker` working after a TrueNAS SCALE update. In order to start jails automatically after TrueNAS boots, run `/mnt/mypool/jailmaker/jlmkr.py startup` as Post Init Script with Type `Command` from the TrueNAS web interface. This creates the `jlmkr` symlink (if possible), as well as start all the jails with `startup=1` in the config file.
### Start Jail ### Start Jail
@ -103,6 +121,12 @@ jlmkr remove myjail
jlmkr stop myjail jlmkr stop myjail
``` ```
### Restart Jail
```shell
jlmkr restart myjail
```
### Jail Shell ### Jail Shell
```shell ```shell
@ -123,7 +147,7 @@ jlmkr log myjail
### Additional Commands ### Additional Commands
Expert users may use the following additional commands to manage jails directly: `machinectl`, `systemd-nspawn`, `systemd-run`, `systemctl` and `journalctl`. The `jlmkr` script uses these commands under the hood and implements a subset of their capabilities. If you use them directly you will bypass any safety checks or configuration done by `jlmkr` and not everything will work in the context of TrueNAS SCALE. Expert users may use the following additional commands to manage jails directly: `machinectl`, `systemd-nspawn`, `systemd-run`, `systemctl` and `journalctl`. The `jlmkr` script uses these commands under the hood and implements a subset of their functions. If you use them directly you will bypass any safety checks or configuration done by `jlmkr` and not everything will work in the context of TrueNAS SCALE.
## Networking ## Networking
@ -133,15 +157,7 @@ See [Advanced Networking](./NETWORKING.md) for more.
## Docker ## Docker
The `jailmaker` script won't install Docker for you, but it can setup the jail with the capabilities required to run docker. You can manually install Docker inside the jail using the [official installation guide](https://docs.docker.com/engine/install/#server) or use [convenience script](https://get.docker.com). The `jailmaker` script won't install Docker for you, but it can setup the jail with the capabilities required to run docker. You can manually install Docker inside the jail using the [official installation guide](https://docs.docker.com/engine/install/#server) or use [convenience script](https://get.docker.com). Additionally you may use the [docker config template](./templates/docker/README.md).
## Nvidia GPU
To make passthrough of the nvidia GPU work, you need to schedule a Pre Init command. The reason is that TrueNAS SCALE by default doesn't load the nvidia kernel modules (and `jailmaker` doesn't do that either). [This screenshot](https://user-images.githubusercontent.com/1704047/222915803-d6dd51b0-c4dd-4189-84be-a04d38cca0b3.png) shows what the configuration should look like.
```
[ ! -f /dev/nvidia-uvm ] && modprobe nvidia-current-uvm && /usr/bin/nvidia-modprobe -c0 -u
```
## Documentation ## Documentation

View File

@ -1,15 +1,20 @@
# TrueNAS Compatibility # TrueNAS Compatibility
TrueNAS Core ❌ | | |
TrueNAS 22.12 ✅ |---|---|
TrueNAS 23.10-BETA1 ✅ |TrueNAS Core|❌|
TrueNAS 23.10-RC1 ✅ |TrueNAS 22.12|✅|
|TrueNAS 23.10|✅|
|TrueNAS 24.04 nightly|✅|
# Distro Compatibility # Distro Compatibility
Debian 11 Bullseye ✅ | | |
Debian 12 Bookworm ✅ |---|---|
Arch 🟨 |Debian 11 Bullseye|✅|
Ubuntu 🟨 |Debian 12 Bookworm|✅|
Alpine ❌ |Ubuntu Jammy|✅|
|Fedora 39|✅|
|Arch|🟨|
|Alpine|❌|
✅ = Personally tested and working ✅ = Personally tested and working
🟨 = Haven't personally tested 🟨 = Haven't personally tested

1843
jlmkr.py

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,5 @@
# Debian Docker Jail Template
## Setup
Check out the [config](./config) template file. You may provide it when asked during `jlmkr create` or, if you have the template file stored on your NAS, you may provide it directly by running `jlmkr create --start --config /mnt/tank/path/to/docker/config mydockerjail`.

61
templates/docker/config Normal file
View File

@ -0,0 +1,61 @@
startup=0
gpu_passthrough_intel=1
gpu_passthrough_nvidia=0
# Use macvlan networking to provide an isolated network namespace,
# so docker can manage firewall rules
# Alternatively use --network-bridge=br1 instead of --network-macvlan
# Ensure to change eno1/br1 to the interface name you want to use
# You may want to add additional options here, e.g. bind mounts
systemd_nspawn_user_args=--network-macvlan=eno1
--resolv-conf=bind-host
--system-call-filter='add_key keyctl bpf'
# Script to run on the HOST before starting the jail
# Load kernel module and config kernel settings required for docker
pre_start_hook=#!/usr/bin/bash
set -euo pipefail
echo 'PRE_START_HOOK'
echo 1 > /proc/sys/net/ipv4/ip_forward
modprobe br_netfilter
echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables
# Only used while creating the jail
distro=debian
release=bookworm
# Install docker inside the jail:
# https://docs.docker.com/engine/install/debian/#install-using-the-repository
# NOTE: this script will run in the host networking namespace and ignores
# all systemd_nspawn_user_args such as bind mounts
initial_setup=#!/usr/bin/bash
set -euo pipefail
apt-get update && apt-get -y install ca-certificates curl
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
chmod a+r /etc/apt/keyrings/docker.asc
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
tee /etc/apt/sources.list.d/docker.list > /dev/null
apt-get update
apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# You generally will not need to change the options below
systemd_run_default_args=--property=KillMode=mixed
--property=Type=notify
--property=RestartForceExitStatus=133
--property=SuccessExitStatus=133
--property=Delegate=yes
--property=TasksMax=infinity
--collect
--setenv=SYSTEMD_NSPAWN_LOCK=0
systemd_nspawn_default_args=--keep-unit
--quiet
--boot
--bind-ro=/sys/module
--inaccessible=/sys/module/apparmor

82
templates/incus/README.md Normal file
View File

@ -0,0 +1,82 @@
# Debian Incus Jail Template (LXD / LXC / KVM)
## Disclaimer
**Experimental. Using Incus in this setup hasn't been extensively tested and has [known issues](#known-issues).**
## Setup
Check out the [config](./config) template file. You may provide it when asked during `jlmkr create` or, if you have the template file stored on your NAS, you may provide it directly by running `jlmkr create --start --config /mnt/tank/path/to/incus/config myincusjail`.
Unfortunately incus doesn't want to install from the `initial_setup` script inside the config file. So we manually finish the setup by running the following after creating and starting the jail:
```bash
jlmkr exec myincusjail bash -c 'apt-get -y install incus incus-ui-canonical &&
incus admin init'
```
Follow [First steps with Incus](https://linuxcontainers.org/incus/docs/main/tutorial/first_steps/).
Then visit the Incus GUI inside the browser https://0.0.0.0:8443. To find out which IP address to use instead of 0.0.0.0, check the IP address for your jail with `jlmkr list`.
## Known Issues
Using Incus in the jail will cause the following error when starting a VM from the TrueNAS SCALE web GUI:
```
[EFAULT] internal error: process exited while connecting to monitor: Could not access KVM kernel module: Permission denied 2024-02-16T14:40:14.886658Z qemu-system-x86_64: -accel kvm: failed to initialize kvm: Permission denied
```
A reboot will resolve the issue (until you start the Incus jail again).
## Create Ubuntu Desktop VM
Incus web GUI should be running on port 8443. Create new instance, call it `desktop`, and choose the `Ubuntu jammy desktop virtual-machine ubuntu/22.04/desktop` image.
## Bind mount / virtiofs
To access files from the TrueNAS host directly in a VM created with incus, we can use virtiofs.
```bash
incus config device add desktop test disk source=/home/test/ path=/mnt/test
```
The command above (when ran as root user inside the incus jail) adds a new virtiofs mount of a test directory inside the jail to a VM named desktop. The `/home/test` dir resides in the jail, but you can first bind mount any directory from the TrueNAS host inside the incus jail and then forward this to the VM using virtiofs. This could be an alternative to NFS mounts.
### Benchmarks
#### Inside LXD ubuntu desktop VM with virtiofs mount
root@desktop:/mnt/test# mount | grep test
incus_test on /mnt/test type virtiofs (rw,relatime)
root@desktop:/mnt/test# time iozone -a
[...]
real 2m22.389s
user 0m2.222s
sys 0m59.275s
#### In a jailmaker jail on the host:
root@incus:/home/test# time iozone -a
[...]
real 0m59.486s
user 0m1.468s
sys 0m25.458s
#### Inside LXD ubuntu desktop VM with virtiofs mount
root@desktop:/mnt/test# dd if=/dev/random of=./test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 36.321 s, 29.6 MB/s
#### In a jailmaker jail on the host:
root@incus:/home/test# dd if=/dev/random of=./test2.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.03723 s, 153 MB/s
## Create Ubuntu container
To be able to create unprivileged (rootless) containers with incus inside the jail, you need to increase the amount of UIDs available inside the jail. Please refer to the [Podman instructions](../podman/README.md) for more information. If you don't increase the UIDs you can only create privileged containers. You'd have to change `Privileged` to `Allow` in `Security policies` in this case.
## References
- [Running QEMU/KVM Virtual Machines in Unprivileged LXD Containers](https://dshcherb.github.io/2017/12/04/qemu-kvm-virtual-machines-in-unprivileged-lxd.html)

73
templates/incus/config Normal file
View File

@ -0,0 +1,73 @@
# WARNING: EXPERIMENTAL CONFIG TEMPLATE!
startup=0
gpu_passthrough_intel=1
gpu_passthrough_nvidia=0
# Use macvlan networking to provide an isolated network namespace,
# so incus can manage firewall rules
# Alternatively use --network-bridge=br1 instead of --network-macvlan
# Ensure to change eno1/br1 to the interface name you want to use
# You may want to add additional options here, e.g. bind mounts
# TODO: don't use --capability=all but specify only the required capabilities
# TODO: or add and use privileged flag?
systemd_nspawn_user_args=--network-macvlan=eno1
--resolv-conf=bind-host
--capability=all
--bind=/dev/fuse
--bind=/dev/kvm
--bind=/dev/vsock
--bind=/dev/vhost-vsock
# Script to run on the HOST before starting the jail
# Load kernel module and config kernel settings required for incus
pre_start_hook=#!/usr/bin/bash
set -euo pipefail
echo 'PRE_START_HOOK'
echo 1 > /proc/sys/net/ipv4/ip_forward
modprobe br_netfilter
echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables
modprobe vhost_vsock
# Only used while creating the jail
distro=debian
release=bookworm
# Install incus according to:
# https://github.com/zabbly/incus#installation
# NOTE: this script will run in the host networking namespace and ignores
# all systemd_nspawn_user_args such as bind mounts
initial_setup=#!/usr/bin/bash
set -euo pipefail
apt-get update && apt-get -y install curl
mkdir -p /etc/apt/keyrings/
curl -fsSL https://pkgs.zabbly.com/key.asc -o /etc/apt/keyrings/zabbly.asc
sh -c 'cat <<EOF > /etc/apt/sources.list.d/zabbly-incus-stable.sources
Enabled: yes
Types: deb
URIs: https://pkgs.zabbly.com/incus/stable
Suites: $(. /etc/os-release && echo ${VERSION_CODENAME})
Components: main
Architectures: $(dpkg --print-architecture)
Signed-By: /etc/apt/keyrings/zabbly.asc
EOF'
apt-get update
# You generally will not need to change the options below
systemd_run_default_args=--property=KillMode=mixed
--property=Type=notify
--property=RestartForceExitStatus=133
--property=SuccessExitStatus=133
--property=Delegate=yes
--property=TasksMax=infinity
--collect
--setenv=SYSTEMD_NSPAWN_LOCK=0
# TODO: add below if required:
# --property=DevicePolicy=auto
systemd_nspawn_default_args=--keep-unit
--quiet
--boot
--bind-ro=/sys/module
--inaccessible=/sys/module/apparmor

83
templates/lxd/README.md Normal file
View File

@ -0,0 +1,83 @@
# Ubuntu LXD Jail Template
## Disclaimer
**Experimental. Using LXD in this setup hasn't been extensively tested and has [known issues](#known-issues).**
## Setup
Check out the [config](./config) template file. You may provide it when asked during `jlmkr create` or, if you have the template file stored on your NAS, you may provide it directly by running `jlmkr create --start --config /mnt/tank/path/to/lxd/config mylxdjail`.
Unfortunately snapd doesn't want to install from the `initial_setup` script inside the config file. So we manually finish the setup by running the following after creating and starting the jail:
```bash
# Repeat listing the jail until you see it has an IPv4 address
jlmkr list
# Install packages
jlmkr exec mylxdjail bash -c 'apt-get update &&
apt-get install -y --no-install-recommends snapd &&
snap install lxd'
```
Choose the `dir` storage backend during `lxd init` and answer `yes` to "Would you like the LXD server to be available over the network?"
```bash
jlmkr exec mylxdjail bash -c 'lxd init &&
snap set lxd ui.enable=true &&
systemctl reload snap.lxd.daemon'
```
Then visit the `lxd` GUI inside the browser https://0.0.0.0:8443. To find out which IP address to use instead of 0.0.0.0, check the IP address for your jail with `jlmkr list`.
## Known Issues
### Instance creation failed
[LXD no longer has access to the LinuxContainers image server](https://discuss.linuxcontainers.org/t/important-notice-for-lxd-users-image-server/18479).
```
Failed getting remote image info: Failed getting image: The requested image couldn't be found for fingerprint "ubuntu/focal/desktop"
```
### SCALE Virtual Machines
Using LXD in the jail will cause the following error when starting a VM from the TrueNAS SCALE web GUI:
```
[EFAULT] internal error: process exited while connecting to monitor: Could not access KVM kernel module: Permission denied 2024-02-16T14:40:14.886658Z qemu-system-x86_64: -accel kvm: failed to initialize kvm: Permission denied
```
A reboot will resolve the issue (until you start the LXD jail again).
### ZFS Issues
If you create a new dataset on your pool (e.g. `tank`) called `lxd` from the TrueNAS SCALE web GUI and tell LXD to use it during `lxd init`, then you will run into issues. Firstly you'd have to run `apt-get install -y --no-install-recommends zfsutils-linux` inside the jail to install the ZFS userspace utils and you've have to add `--bind=/dev/zfs` to the `systemd_nspawn_user_args` in the jail config. By mounting `/dev/zfs` into this jail, **it will have total control of the storage on the host!**
But then SCALE doesn't seem to like the ZFS datasets created by LXD. I get the following errors when browsing the sub-datasets:
```
[EINVAL] legacy: path must be absolute
```
```
[EFAULT] Failed retreiving USER quotas for tank/lxd/virtual-machines
```
As long as you don't operate on these datasets in the SCALE GUI this may not be a real problem...
However, creating an LXD VM doesn't work with the ZFS storage backend (creating a container works though):
```
Failed creating instance from image: Could not locate a zvol for tank/lxd/images/1555b13f0e89bfcf516bd0090eee6f73a0db5f4d0d36c38cae94316de82bf817.block
```
Could this be the same issue as [Instance creation failed](#instance-creation-failed)?
## More info
Refer to the [Incus README](../incus/README.md) as a lot of it applies to LXD too.
## References
- [Running QEMU/KVM Virtual Machines in Unprivileged LXD Containers](https://dshcherb.github.io/2017/12/04/qemu-kvm-virtual-machines-in-unprivileged-lxd.html)

59
templates/lxd/config Normal file
View File

@ -0,0 +1,59 @@
# WARNING: EXPERIMENTAL CONFIG TEMPLATE!
startup=0
gpu_passthrough_intel=1
gpu_passthrough_nvidia=0
# Use macvlan networking to provide an isolated network namespace,
# so lxd can manage firewall rules
# Alternatively use --network-bridge=br1 instead of --network-macvlan
# Ensure to change eno1/br1 to the interface name you want to use
# You may want to add additional options here, e.g. bind mounts
# TODO: don't use --capability=all but specify only the required capabilities
# TODO: or add and use privileged flag?
systemd_nspawn_user_args=--network-bridge=br1
--resolv-conf=bind-host
--capability=all
--bind=/dev/fuse
--bind=/dev/kvm
--bind=/dev/vsock
--bind=/dev/vhost-vsock
# Script to run on the HOST before starting the jail
# Load kernel module and config kernel settings required for lxd
pre_start_hook=#!/usr/bin/bash
set -euo pipefail
echo 'PRE_START_HOOK'
echo 1 > /proc/sys/net/ipv4/ip_forward
modprobe br_netfilter
echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables
modprobe vhost_vsock
# Only used while creating the jail
distro=ubuntu
release=jammy
# NOTE: this script will run in the host networking namespace and ignores
# all systemd_nspawn_user_args such as bind mounts
initial_setup=#!/usr/bin/bash
set -euo pipefail
# https://discuss.linuxcontainers.org/t/snap-inside-privileged-lxd-container/13691/8
ln -sf /bin/true /usr/local/bin/udevadm
# You generally will not need to change the options below
systemd_run_default_args=--property=KillMode=mixed
--property=Type=notify
--property=RestartForceExitStatus=133
--property=SuccessExitStatus=133
--property=Delegate=yes
--property=TasksMax=infinity
--collect
--setenv=SYSTEMD_NSPAWN_LOCK=0
# TODO: add below if required:
# --property=DevicePolicy=auto
systemd_nspawn_default_args=--keep-unit
--quiet
--boot
--bind-ro=/sys/module
--inaccessible=/sys/module/apparmor

123
templates/podman/README.md Normal file
View File

@ -0,0 +1,123 @@
# Fedora Podman Jail Template
## Setup
Check out the [config](./config) template file. You may provide it when asked during `jlmkr create` or, if you have the template file stored on your NAS, you may provide it directly by running `jlmkr create --start --config /mnt/tank/path/to/podman/config mypodmanjail`.
## Rootless
### Disclaimer
**Experimental. Using podman in this setup hasn't been extensively tested.**
### Installation
Prerequisites: created a jail using the [config](./config) template file.
Run `jlmkr edit mypodmanjail` and add `--private-users=524288:65536 --private-users-ownership=chown` to `systemd_nspawn_user_args`. We start at UID 524288, as this is the [systemd range used for containers](https://github.com/systemd/systemd/blob/main/docs/UIDS-GIDS.md#summary).
The `--private-users-ownership=chown` option will ensure the rootfs ownership is corrected.
After the jail has started run `jlmkr stop mypodmanjail && jlmkr edit mypodmanjail`, remove `--private-users-ownership=chown` and increase the UID range to `131072` to double the number of UIDs available in the jail. We need more than 65536 UIDs available in the jail, since rootless podman also needs to be able to map UIDs. If I leave the `--private-users-ownership=chown` option I get the following error:
> systemd-nspawn[678877]: Automatic UID/GID adjusting is only supported for UID/GID ranges starting at multiples of 2^16 with a range of 2^16
The flags look like this now:
```
systemd_nspawn_user_args=--network-macvlan=eno1
--resolv-conf=bind-host
--system-call-filter='add_key keyctl bpf'
--private-users=524288:131072
```
Start the jail with `jlmkr start mypodmanjail` and open a shell session inside the jail (as the remapped root user) with `jlmkr shell mypodmanjail`.
Then inside the jail setup the new rootless user:
```bash
# Create new user
adduser rootless
# Set password for user
passwd rootless
# Clear the subuids and subgids which have been assigned by default when creating the new user
usermod --del-subuids 0-4294967295 --del-subgids 0-4294967295 rootless
# Set a specific range, so it fits inside the number of available UIDs
usermod --add-subuids 65536-131071 --add-subgids 65536-131071 rootless
# Check the assigned range
cat /etc/subuid
# Check the available range
cat /proc/self/uid_map
exit
```
From the TrueNAS host, open a shell as the rootless user inside the jail.
```bash
jlmkr shell --uid 1000 mypodmanjail
```
Run rootless podman as user 1000.
```bash
id
podman run hello-world
podman info
```
The output of podman info should contain:
```
graphDriverName: overlay
graphOptions: {}
graphRoot: /home/rootless/.local/share/containers/storage
[...]
graphStatus:
Backing Filesystem: zfs
Native Overlay Diff: "true"
Supports d_type: "true"
Supports shifting: "false"
Supports volatile: "true"
Using metacopy: "false"
```
### Binding to Privileged Ports:
Add `sysctl net.ipv4.ip_unprivileged_port_start=23` to the `pre_start_hook` inside the config to lower the range of privileged ports. This will still prevent an unprivileged process from impersonating the sshd daemon. Since this lowers the range globally on the TrueNAS host, a better solution would be to specifically add the capability to bind to privileged ports.
## Cockpit Management
Install and enable cockpit:
```bash
jlmkr exec mypodmanjail bash -c "dnf -y install cockpit cockpit-podman && \
systemctl enable --now cockpit.socket && \
ip a &&
ip route | awk '/default/ { print \$9 }'"
```
Check the IP address of the jail and access the Cockpit web interface at https://0.0.0.0:9090 where 0.0.0.0 is the IP address you just found using `ip a`.
If you've setup the `rootless` user, you may login with the password you've created earlier. Otherwise you'd have to add an admin user first:
```bash
jlmkr exec podmantest bash -c 'adduser admin
passwd admin
usermod -aG wheel admin'
```
Click on `Podman containers`. In case it shows `Podman service is not active` then click `Start podman`. You can now manage your (rootless) podman containers in the (rootless) jailmaker jail using the Cockpit web GUI.
## Additional Resources:
Resources mentioning `add_key keyctl bpf`
- https://bbs.archlinux.org/viewtopic.php?id=252840
- https://wiki.archlinux.org/title/systemd-nspawn
- https://discourse.nixos.org/t/podman-docker-in-nixos-container-ideally-in-unprivileged-one/22909/12
Resources mentioning `@keyring`
- https://github.com/systemd/systemd/issues/17606
- https://github.com/systemd/systemd/blob/1c62c4fe0b54fb419b875cb2bae82a261518a745/src/shared/seccomp-util.c#L604
`@keyring` also includes `request_key` but doesn't include `bpf`

54
templates/podman/config Normal file
View File

@ -0,0 +1,54 @@
startup=0
gpu_passthrough_intel=0
gpu_passthrough_nvidia=0
# Use macvlan networking to provide an isolated network namespace,
# so podman can manage firewall rules
# Alternatively use --network-bridge=br1 instead of --network-macvlan
# Ensure to change eno1/br1 to the interface name you want to use
# You may want to add additional options here, e.g. bind mounts
systemd_nspawn_user_args=--network-macvlan=eno1
--resolv-conf=bind-host
--system-call-filter='add_key keyctl bpf'
# Script to run on the HOST before starting the jail
# Load kernel module and config kernel settings required for podman
pre_start_hook=#!/usr/bin/bash
set -euo pipefail
echo 'PRE_START_HOOK'
echo 1 > /proc/sys/net/ipv4/ip_forward
modprobe br_netfilter
echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables
# Only used while creating the jail
distro=fedora
release=39
# Install podman inside the jail
# NOTE: this script will run in the host networking namespace and ignores
# all systemd_nspawn_user_args such as bind mounts
initial_setup=#!/usr/bin/bash
set -euo pipefail
dnf -y install podman
# Add the required capabilities to the `newuidmap` and `newgidmap` binaries
# https://github.com/containers/podman/issues/2788#issuecomment-1016301663
# https://github.com/containers/podman/issues/12637#issuecomment-996524341
setcap cap_setuid+eip /usr/bin/newuidmap
setcap cap_setgid+eip /usr/bin/newgidmap
# You generally will not need to change the options below
systemd_run_default_args=--property=KillMode=mixed
--property=Type=notify
--property=RestartForceExitStatus=133
--property=SuccessExitStatus=133
--property=Delegate=yes
--property=TasksMax=infinity
--collect
--setenv=SYSTEMD_NSPAWN_LOCK=0
systemd_nspawn_default_args=--keep-unit
--quiet
--boot
--bind-ro=/sys/module
--inaccessible=/sys/module/apparmor