Release 1.1.0

2024-03-02 18:41:49 +01:00 · 2024-03-02 18:41:49 +01:00 · f4f7311a12
parent 56d53a1feb cd067d7058
commit f4f7311a12
12 changed files with 1990 additions and 532 deletions
--- a/README.md
+++ b/README.md
@ -10,16 +10,19 @@ Persistent Linux 'jails' on TrueNAS SCALE to install software (docker-compose, p

 TrueNAS SCALE can create persistent Linux 'jails' with systemd-nspawn. This script helps with the following:

- Installing the systemd-container package (which includes systemd-nspawn)
 - Setting up the jail so it won't be lost when you update SCALE
 - Choosing a distro (Debian 12 strongly recommended, but Ubuntu, Arch Linux or Rocky Linux seem good choices too)
 - Optional: configuring the jail so you can run Docker inside it
 - Optional: GPU passthrough (including [nvidia GPU](README.md#nvidia-gpu) with the drivers bind mounted from the host)
 - Starting the jail with your config applied

+## Security
+
+Despite what the word 'jail' implies, jailmaker's intended use case is to create one or more additional filesystems to run alongside SCALE with minimal isolation. By default the root user in the jail with uid 0 is mapped to the host's uid 0. This has [obvious security implications](https://linuxcontainers.org/lxc/security/#privileged-containers). If this is not acceptable to you, you may lock down the jails by [limiting capabilities](https://manpages.debian.org/bookworm/systemd-container/systemd-nspawn.1.en.html#Security_Options) and/or using [user namespacing](https://manpages.debian.org/bookworm/systemd-container/systemd-nspawn.1.en.html#User_Namespacing_Options) or use a VM instead.
+
 ## Installation

-Create a new dataset called `jailmaker` with the default settings (from TrueNAS web interface). Then login as the root user and download `jlmkr.py`.
+[Installation steps with screenshots](https://www.truenas.com/docs/scale/scaletutorials/apps/sandboxes/) are provided on the TrueNAS website. Start by creating a new dataset called `jailmaker` with the default settings (from TrueNAS web interface). Then login as the root user and download `jlmkr.py`.

 ```shell
 cd /mnt/mypool/jailmaker
@ -28,34 +31,49 @@ chmod +x jlmkr.py
 ./jlmkr.py install
 ```

-The `jlmkr.py` script (and the jails + config it creates) are now stored on the `jailmaker` dataset and will survive updates of TrueNAS SCALE. Additionally a symlink has been created so you can call `jlmkr` from anywhere.
+The `jlmkr.py` script (and the jails + config it creates) are now stored on the `jailmaker` dataset and will survive updates of TrueNAS SCALE. A symlink has been created so you can call `jlmkr` from anywhere (unless the boot pool is readonly, which is the default since SCALE 24.04). Additionally shell aliases have been setup, so you can still call `jlmkr` in an interactive shell (even if the symlink couldn't be created).

-After an update of TrueNAS SCALE the symlink will be lost and `systemd-nspawn` (the core package which makes `jailmaker` work) may be gone too. Not to worry, just run `./jlmkr.py install` again or use [the `./jlmkr.py startup` command](#startup-jails-on-boot).
+After an update of TrueNAS SCALE the symlink will be lost (but the shell aliases will remain). To restore the symlink, just run `./jlmkr.py install` again or use [the `./jlmkr.py startup` command](#startup-jails-on-boot).

 ## Usage

 ### Create Jail

-Creating a jail is interactive. You'll be presented with questions which guide you through the process.
+Creating jail with the default settings is as simple as:

 ```shell
 jlmkr create myjail
 ```

-After answering a few questions you should have your first jail up and running!
+You may also specify a path to a config template, for a quick and consistent jail creation process.
+
+```shell
+jlmkr create --config /path/to/config/template myjail
+```
+
+Or you can override the default config by using flags. See `jlmkr create --help` for the available options. Anything passed after the jail name will be passed to `systemd-nspawn` when starting the jail. See the `systemd-nspawn` manual for available options, specifically [Mount Options](https://manpages.debian.org/bookworm/systemd-container/systemd-nspawn.1.en.html#Mount_Options) and [Networking Options](https://manpages.debian.org/bookworm/systemd-container/systemd-nspawn.1.en.html#Networking_Options) are frequently used.
+
+```shell
+jlmkr create --distro=ubuntu --release=jammy myjail --bind-ro=/mnt
+```
+
+If you omit the jail name, the create process is interactive. You'll be presented with questions which guide you through the process.
+
+```shell
+jlmkr create
+```
+
+After answering some questions you should have your first jail up and running!

 ### Startup Jails on Boot

 ```shell
-# Best to call startup directly (not through the jlmkr symlink)
+# Call startup using the absolute path to jlmkr.py
+# The jlmkr shell alias doesn't work in Init/Shutdown Scripts
 /mnt/mypool/jailmaker/jlmkr.py startup
-
-# Can be called from the symlink too...
-# But this may not be available after a TrueNAS SCALE update
-jlmkr startup
 ```

-In order to start jails automatically after TrueNAS boots, run `/mnt/mypool/jailmaker/jlmkr.py startup` as Post Init Script with Type `Command` from the TrueNAS web interface. This will automatically fix the installation of `systemd-nspawn` and setup the `jlmkr` symlink, as well as start all the jails with `startup=1` in the config file. Running the `startup` command Post Init is recommended to keep `jailmaker` working after a TrueNAS SCALE update.
+In order to start jails automatically after TrueNAS boots, run `/mnt/mypool/jailmaker/jlmkr.py startup` as Post Init Script with Type `Command` from the TrueNAS web interface. This creates the `jlmkr` symlink (if possible), as well as start all the jails with `startup=1` in the config file.

 ### Start Jail

@ -103,6 +121,12 @@ jlmkr remove myjail
 jlmkr stop myjail
 ```

+### Restart Jail
+
+```shell
+jlmkr restart myjail
+```
+
 ### Jail Shell

 ```shell
@ -123,7 +147,7 @@ jlmkr log myjail

 ### Additional Commands

-Expert users may use the following additional commands to manage jails directly: `machinectl`, `systemd-nspawn`, `systemd-run`, `systemctl` and `journalctl`. The `jlmkr` script uses these commands under the hood and implements a subset of their capabilities. If you use them directly you will bypass any safety checks or configuration done by `jlmkr` and not everything will work in the context of TrueNAS SCALE.
+Expert users may use the following additional commands to manage jails directly: `machinectl`, `systemd-nspawn`, `systemd-run`, `systemctl` and `journalctl`. The `jlmkr` script uses these commands under the hood and implements a subset of their functions. If you use them directly you will bypass any safety checks or configuration done by `jlmkr` and not everything will work in the context of TrueNAS SCALE.

 ## Networking

@ -133,15 +157,7 @@ See [Advanced Networking](./NETWORKING.md) for more.

 ## Docker

-The `jailmaker` script won't install Docker for you, but it can setup the jail with the capabilities required to run docker. You can manually install Docker inside the jail using the [official installation guide](https://docs.docker.com/engine/install/#server) or use [convenience script](https://get.docker.com).
-
-## Nvidia GPU
-
-To make passthrough of the nvidia GPU work, you need to schedule a Pre Init command. The reason is that TrueNAS SCALE by default doesn't load the nvidia kernel modules (and `jailmaker` doesn't do that either). [This screenshot](https://user-images.githubusercontent.com/1704047/222915803-d6dd51b0-c4dd-4189-84be-a04d38cca0b3.png) shows what the configuration should look like.
-
-```
-[ ! -f /dev/nvidia-uvm ] && modprobe nvidia-current-uvm && /usr/bin/nvidia-modprobe -c0 -u
-```
+The `jailmaker` script won't install Docker for you, but it can setup the jail with the capabilities required to run docker. You can manually install Docker inside the jail using the [official installation guide](https://docs.docker.com/engine/install/#server) or use [convenience script](https://get.docker.com). Additionally you may use the [docker config template](./templates/docker/README.md).

 ## Documentation

--- a/docs/wikimain.md
+++ b/docs/wikimain.md
--- a/docs/compatibility.md
+++ b/docs/compatibility.md
@ -1,15 +1,20 @@
 # TrueNAS Compatibility
-TrueNAS Core           ❌
-TrueNAS 22.12          ✅
-TrueNAS 23.10-BETA1    ✅
-TrueNAS 23.10-RC1      ✅
+|   |   |
+|---|---|
+|TrueNAS Core|❌|
+|TrueNAS 22.12|✅|
+|TrueNAS 23.10|✅|
+|TrueNAS 24.04 nightly|✅|

 # Distro Compatibility
-Debian 11 Bullseye     ✅
-Debian 12 Bookworm     ✅
-Arch                   🟨
-Ubuntu                 🟨
-Alpine                 ❌
+|   |   |
+|---|---|
+|Debian 11 Bullseye|✅|
+|Debian 12 Bookworm|✅|
+|Ubuntu Jammy|✅|
+|Fedora 39|✅|
+|Arch|🟨|
+|Alpine|❌|      

 ✅ = Personally tested and working
 🟨 = Haven't personally tested
--- a/jlmkr.py
+++ b/jlmkr.py
--- a/templates/docker/README.md
+++ b/templates/docker/README.md
@ -0,0 +1,5 @@
+# Debian Docker Jail Template
+
+## Setup
+
+Check out the [config](./config) template file. You may provide it when asked during `jlmkr create` or, if you have the template file stored on your NAS, you may provide it directly by running `jlmkr create --start --config /mnt/tank/path/to/docker/config mydockerjail`.
--- a/templates/docker/config
+++ b/templates/docker/config
@ -0,0 +1,61 @@
+startup=0
+gpu_passthrough_intel=1
+gpu_passthrough_nvidia=0
+
+# Use macvlan networking to provide an isolated network namespace,
+# so docker can manage firewall rules
+# Alternatively use --network-bridge=br1 instead of --network-macvlan
+# Ensure to change eno1/br1 to the interface name you want to use
+# You may want to add additional options here, e.g. bind mounts
+systemd_nspawn_user_args=--network-macvlan=eno1
+    --resolv-conf=bind-host
+    --system-call-filter='add_key keyctl bpf'
+
+# Script to run on the HOST before starting the jail
+# Load kernel module and config kernel settings required for docker
+pre_start_hook=#!/usr/bin/bash
+    set -euo pipefail
+    echo 'PRE_START_HOOK'
+    echo 1 > /proc/sys/net/ipv4/ip_forward
+    modprobe br_netfilter
+    echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
+    echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables
+
+# Only used while creating the jail
+distro=debian
+release=bookworm
+
+# Install docker inside the jail:
+# https://docs.docker.com/engine/install/debian/#install-using-the-repository
+# NOTE: this script will run in the host networking namespace and ignores
+# all systemd_nspawn_user_args such as bind mounts
+initial_setup=#!/usr/bin/bash
+    set -euo pipefail
+
+    apt-get update && apt-get -y install ca-certificates curl
+    install -m 0755 -d /etc/apt/keyrings
+    curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
+    chmod a+r /etc/apt/keyrings/docker.asc
+
+    echo \
+    "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
+    $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
+    tee /etc/apt/sources.list.d/docker.list > /dev/null
+    apt-get update
+    apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
+
+# You generally will not need to change the options below
+systemd_run_default_args=--property=KillMode=mixed
+    --property=Type=notify
+    --property=RestartForceExitStatus=133
+    --property=SuccessExitStatus=133
+    --property=Delegate=yes
+    --property=TasksMax=infinity
+    --collect
+    --setenv=SYSTEMD_NSPAWN_LOCK=0
+
+systemd_nspawn_default_args=--keep-unit
+    --quiet
+    --boot
+    --bind-ro=/sys/module
+    --inaccessible=/sys/module/apparmor
--- a/templates/incus/README.md
+++ b/templates/incus/README.md
@ -0,0 +1,82 @@
+# Debian Incus Jail Template (LXD / LXC / KVM)
+
+## Disclaimer
+
+**Experimental. Using Incus in this setup hasn't been extensively tested and has [known issues](#known-issues).**
+
+## Setup
+
+Check out the [config](./config) template file. You may provide it when asked during `jlmkr create` or, if you have the template file stored on your NAS, you may provide it directly by running `jlmkr create --start --config /mnt/tank/path/to/incus/config myincusjail`.
+
+Unfortunately incus doesn't want to install from the `initial_setup` script inside the config file. So we manually finish the setup by running the following after creating and starting the jail:
+
+```bash
+jlmkr exec myincusjail bash -c 'apt-get -y install incus incus-ui-canonical &&
+    incus admin init'
+```    
+
+Follow [First steps with Incus](https://linuxcontainers.org/incus/docs/main/tutorial/first_steps/).
+
+Then visit the Incus GUI inside the browser https://0.0.0.0:8443. To find out which IP address to use instead of 0.0.0.0, check the IP address for your jail with `jlmkr list`.
+
+## Known Issues
+
+Using Incus in the jail will cause the following error when starting a VM from the TrueNAS SCALE web GUI:
+
+```
+[EFAULT] internal error: process exited while connecting to monitor: Could not access KVM kernel module: Permission denied 2024-02-16T14:40:14.886658Z qemu-system-x86_64: -accel kvm: failed to initialize kvm: Permission denied
+```
+
+A reboot will resolve the issue (until you start the Incus jail again).
+
+## Create Ubuntu Desktop VM
+
+Incus web GUI should be running on port 8443. Create new instance, call it `desktop`, and choose the `Ubuntu	jammy desktop virtual-machine ubuntu/22.04/desktop` image.
+
+## Bind mount / virtiofs
+
+To access files from the TrueNAS host directly in a VM created with incus, we can use virtiofs.
+
+```bash
+incus config device add desktop test disk source=/home/test/ path=/mnt/test
+```
+
+The command above (when ran as root user inside the incus jail) adds a new virtiofs mount of a test directory inside the jail to a VM named desktop. The `/home/test` dir resides in the jail, but you can first bind mount any directory from the TrueNAS host inside the incus jail and then forward this to the VM using virtiofs. This could be an alternative to NFS mounts.
+
+### Benchmarks
+
+#### Inside LXD ubuntu desktop VM with virtiofs mount
+root@desktop:/mnt/test# mount | grep test
+incus_test on /mnt/test type virtiofs (rw,relatime)
+root@desktop:/mnt/test# time iozone -a
+[...]
+real    2m22.389s
+user    0m2.222s
+sys     0m59.275s
+
+#### In a jailmaker jail on the host:
+root@incus:/home/test# time iozone -a
+[...]
+real	0m59.486s
+user	0m1.468s
+sys	0m25.458s
+
+#### Inside LXD ubuntu desktop VM with virtiofs mount
+root@desktop:/mnt/test# dd if=/dev/random of=./test1.img bs=1G count=1 oflag=dsync
+1+0 records in
+1+0 records out
+1073741824 bytes (1.1 GB, 1.0 GiB) copied, 36.321 s, 29.6 MB/s
+
+#### In a jailmaker jail on the host:
+root@incus:/home/test# dd if=/dev/random of=./test2.img bs=1G count=1 oflag=dsync
+1+0 records in
+1+0 records out
+1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.03723 s, 153 MB/s
+
+## Create Ubuntu container
+
+To be able to create unprivileged (rootless) containers with incus inside the jail, you need to increase the amount of UIDs available inside the jail. Please refer to the [Podman instructions](../podman/README.md) for more information. If you don't increase the UIDs you can only create privileged containers. You'd have to change `Privileged` to `Allow` in `Security policies` in this case.
+
+## References
+
+- [Running QEMU/KVM Virtual Machines in Unprivileged LXD Containers](https://dshcherb.github.io/2017/12/04/qemu-kvm-virtual-machines-in-unprivileged-lxd.html)
--- a/templates/incus/config
+++ b/templates/incus/config
@ -0,0 +1,73 @@
+# WARNING: EXPERIMENTAL CONFIG TEMPLATE!
+startup=0
+gpu_passthrough_intel=1
+gpu_passthrough_nvidia=0
+
+# Use macvlan networking to provide an isolated network namespace,
+# so incus can manage firewall rules
+# Alternatively use --network-bridge=br1 instead of --network-macvlan
+# Ensure to change eno1/br1 to the interface name you want to use
+# You may want to add additional options here, e.g. bind mounts
+# TODO: don't use --capability=all but specify only the required capabilities
+# TODO: or add and use privileged flag?
+systemd_nspawn_user_args=--network-macvlan=eno1
+    --resolv-conf=bind-host
+    --capability=all
+    --bind=/dev/fuse
+    --bind=/dev/kvm
+    --bind=/dev/vsock
+    --bind=/dev/vhost-vsock
+
+# Script to run on the HOST before starting the jail
+# Load kernel module and config kernel settings required for incus
+pre_start_hook=#!/usr/bin/bash
+    set -euo pipefail
+    echo 'PRE_START_HOOK'
+    echo 1 > /proc/sys/net/ipv4/ip_forward
+    modprobe br_netfilter
+    echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
+    echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables
+    modprobe vhost_vsock
+
+# Only used while creating the jail
+distro=debian
+release=bookworm
+
+# Install incus according to:
+# https://github.com/zabbly/incus#installation
+# NOTE: this script will run in the host networking namespace and ignores
+# all systemd_nspawn_user_args such as bind mounts
+initial_setup=#!/usr/bin/bash
+    set -euo pipefail
+    apt-get update && apt-get -y install curl
+    mkdir -p /etc/apt/keyrings/
+    curl -fsSL https://pkgs.zabbly.com/key.asc -o /etc/apt/keyrings/zabbly.asc
+    sh -c 'cat <<EOF > /etc/apt/sources.list.d/zabbly-incus-stable.sources
+    Enabled: yes
+    Types: deb
+    URIs: https://pkgs.zabbly.com/incus/stable
+    Suites: $(. /etc/os-release && echo ${VERSION_CODENAME})
+    Components: main
+    Architectures: $(dpkg --print-architecture)
+    Signed-By: /etc/apt/keyrings/zabbly.asc
+
+    EOF'
+    apt-get update
+
+# You generally will not need to change the options below
+systemd_run_default_args=--property=KillMode=mixed
+    --property=Type=notify
+    --property=RestartForceExitStatus=133
+    --property=SuccessExitStatus=133
+    --property=Delegate=yes
+    --property=TasksMax=infinity
+    --collect
+    --setenv=SYSTEMD_NSPAWN_LOCK=0
+# TODO: add below if required:
+# --property=DevicePolicy=auto
+
+systemd_nspawn_default_args=--keep-unit
+    --quiet
+    --boot
+    --bind-ro=/sys/module
+    --inaccessible=/sys/module/apparmor
--- a/templates/lxd/README.md
+++ b/templates/lxd/README.md
@ -0,0 +1,83 @@
+# Ubuntu LXD Jail Template
+
+## Disclaimer
+
+**Experimental. Using LXD in this setup hasn't been extensively tested and has [known issues](#known-issues).**
+
+## Setup
+
+Check out the [config](./config) template file. You may provide it when asked during `jlmkr create` or, if you have the template file stored on your NAS, you may provide it directly by running `jlmkr create --start --config /mnt/tank/path/to/lxd/config mylxdjail`.
+
+Unfortunately snapd doesn't want to install from the `initial_setup` script inside the config file. So we manually finish the setup by running the following after creating and starting the jail:
+
+```bash
+# Repeat listing the jail until you see it has an IPv4 address
+jlmkr list
+
+# Install packages
+jlmkr exec mylxdjail bash -c 'apt-get update &&
+    apt-get install -y --no-install-recommends snapd &&
+    snap install lxd'
+
+```
+
+Choose the `dir` storage backend during `lxd init` and answer `yes` to "Would you like the LXD server to be available over the network?"
+
+```bash
+jlmkr exec mylxdjail bash -c 'lxd init &&
+    snap set lxd ui.enable=true &&
+    systemctl reload snap.lxd.daemon'
+```
+
+Then visit the `lxd` GUI inside the browser https://0.0.0.0:8443. To find out which IP address to use instead of 0.0.0.0, check the IP address for your jail with `jlmkr list`.
+
+## Known Issues
+
+### Instance creation failed
+
+[LXD no longer has access to the LinuxContainers image server](https://discuss.linuxcontainers.org/t/important-notice-for-lxd-users-image-server/18479).
+
+```
+Failed getting remote image info: Failed getting image: The requested image couldn't be found for fingerprint "ubuntu/focal/desktop"
+```
+
+### SCALE Virtual Machines
+Using LXD in the jail will cause the following error when starting a VM from the TrueNAS SCALE web GUI:
+
+```
+[EFAULT] internal error: process exited while connecting to monitor: Could not access KVM kernel module: Permission denied 2024-02-16T14:40:14.886658Z qemu-system-x86_64: -accel kvm: failed to initialize kvm: Permission denied
+```
+
+A reboot will resolve the issue (until you start the LXD jail again).
+
+### ZFS Issues
+
+If you create a new dataset on your pool (e.g. `tank`) called `lxd` from the TrueNAS SCALE web GUI and tell LXD to use it during `lxd init`, then you will run into issues. Firstly you'd have to run `apt-get install -y --no-install-recommends zfsutils-linux` inside the jail to install the ZFS userspace utils and you've have to add `--bind=/dev/zfs` to the `systemd_nspawn_user_args` in the jail config. By mounting `/dev/zfs` into this jail, **it will have total control of the storage on the host!**
+
+But then SCALE doesn't seem to like the ZFS datasets created by LXD. I get the following errors when browsing the sub-datasets:
+
+```
+[EINVAL] legacy: path must be absolute
+```
+
+```
+[EFAULT] Failed retreiving USER quotas for tank/lxd/virtual-machines
+```
+
+As long as you don't operate on these datasets in the SCALE GUI this may not be a real problem...
+
+However, creating an LXD VM doesn't work with the ZFS storage backend (creating a container works though):
+
+```
+Failed creating instance from image: Could not locate a zvol for tank/lxd/images/1555b13f0e89bfcf516bd0090eee6f73a0db5f4d0d36c38cae94316de82bf817.block
+```
+
+Could this be the same issue as [Instance creation failed](#instance-creation-failed)?
+
+## More info
+
+Refer to the [Incus README](../incus/README.md) as a lot of it applies to LXD too.
+
+## References
+
+- [Running QEMU/KVM Virtual Machines in Unprivileged LXD Containers](https://dshcherb.github.io/2017/12/04/qemu-kvm-virtual-machines-in-unprivileged-lxd.html)
--- a/templates/lxd/config
+++ b/templates/lxd/config
@ -0,0 +1,59 @@
+# WARNING: EXPERIMENTAL CONFIG TEMPLATE!
+startup=0
+gpu_passthrough_intel=1
+gpu_passthrough_nvidia=0
+
+# Use macvlan networking to provide an isolated network namespace,
+# so lxd can manage firewall rules
+# Alternatively use --network-bridge=br1 instead of --network-macvlan
+# Ensure to change eno1/br1 to the interface name you want to use
+# You may want to add additional options here, e.g. bind mounts
+# TODO: don't use --capability=all but specify only the required capabilities
+# TODO: or add and use privileged flag?
+systemd_nspawn_user_args=--network-bridge=br1
+    --resolv-conf=bind-host
+    --capability=all
+    --bind=/dev/fuse
+    --bind=/dev/kvm
+    --bind=/dev/vsock
+    --bind=/dev/vhost-vsock
+
+# Script to run on the HOST before starting the jail
+# Load kernel module and config kernel settings required for lxd
+pre_start_hook=#!/usr/bin/bash
+    set -euo pipefail
+    echo 'PRE_START_HOOK'
+    echo 1 > /proc/sys/net/ipv4/ip_forward
+    modprobe br_netfilter
+    echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
+    echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables
+    modprobe vhost_vsock
+
+# Only used while creating the jail
+distro=ubuntu
+release=jammy
+
+# NOTE: this script will run in the host networking namespace and ignores
+# all systemd_nspawn_user_args such as bind mounts
+initial_setup=#!/usr/bin/bash
+    set -euo pipefail
+    # https://discuss.linuxcontainers.org/t/snap-inside-privileged-lxd-container/13691/8
+    ln -sf /bin/true /usr/local/bin/udevadm
+
+# You generally will not need to change the options below
+systemd_run_default_args=--property=KillMode=mixed
+    --property=Type=notify
+    --property=RestartForceExitStatus=133
+    --property=SuccessExitStatus=133
+    --property=Delegate=yes
+    --property=TasksMax=infinity
+    --collect
+    --setenv=SYSTEMD_NSPAWN_LOCK=0
+# TODO: add below if required:
+# --property=DevicePolicy=auto
+
+systemd_nspawn_default_args=--keep-unit
+    --quiet
+    --boot
+    --bind-ro=/sys/module
+    --inaccessible=/sys/module/apparmor
--- a/templates/podman/README.md
+++ b/templates/podman/README.md
@ -0,0 +1,123 @@
+# Fedora Podman Jail Template
+
+## Setup
+
+Check out the [config](./config) template file. You may provide it when asked during `jlmkr create` or, if you have the template file stored on your NAS, you may provide it directly by running `jlmkr create --start --config /mnt/tank/path/to/podman/config mypodmanjail`.
+
+## Rootless
+
+### Disclaimer
+
+**Experimental. Using podman in this setup hasn't been extensively tested.**
+
+### Installation
+
+Prerequisites: created a jail using the [config](./config) template file.
+
+Run `jlmkr edit mypodmanjail` and add `--private-users=524288:65536 --private-users-ownership=chown` to `systemd_nspawn_user_args`. We start at UID 524288, as this is the [systemd range used for containers](https://github.com/systemd/systemd/blob/main/docs/UIDS-GIDS.md#summary).
+
+The `--private-users-ownership=chown` option will ensure the rootfs ownership is corrected.
+
+After the jail has started run `jlmkr stop mypodmanjail && jlmkr edit mypodmanjail`, remove `--private-users-ownership=chown` and increase the UID range to `131072` to double the number of UIDs available in the jail. We need more than 65536 UIDs available in the jail, since rootless podman also needs to be able to map UIDs. If I leave the `--private-users-ownership=chown` option I get the following error:
+
+> systemd-nspawn[678877]: Automatic UID/GID adjusting is only supported for UID/GID ranges starting at multiples of 2^16 with a range of 2^16
+
+The flags look like this now:
+
+```
+systemd_nspawn_user_args=--network-macvlan=eno1
+    --resolv-conf=bind-host
+    --system-call-filter='add_key keyctl bpf'
+    --private-users=524288:131072
+```
+
+Start the jail with `jlmkr start mypodmanjail` and open a shell session inside the jail (as the remapped root user) with `jlmkr shell mypodmanjail`.
+
+Then inside the jail setup the new rootless user:
+
+```bash
+# Create new user
+adduser rootless
+# Set password for user
+passwd rootless
+
+# Clear the subuids and subgids which have been assigned by default when creating the new user
+usermod --del-subuids 0-4294967295 --del-subgids 0-4294967295 rootless
+# Set a specific range, so it fits inside the number of available UIDs
+usermod --add-subuids 65536-131071 --add-subgids 65536-131071 rootless
+
+# Check the assigned range
+cat /etc/subuid
+# Check the available range
+cat /proc/self/uid_map
+
+exit
+```
+
+From the TrueNAS host, open a shell as the rootless user inside the jail.
+
+```bash
+jlmkr shell --uid 1000 mypodmanjail
+```
+
+Run rootless podman as user 1000.
+
+```bash
+id
+podman run hello-world
+podman info
+```
+
+The output of podman info should contain:
+
+```
+  graphDriverName: overlay
+  graphOptions: {}
+  graphRoot: /home/rootless/.local/share/containers/storage
+  [...]
+  graphStatus:
+    Backing Filesystem: zfs
+    Native Overlay Diff: "true"
+    Supports d_type: "true"
+    Supports shifting: "false"
+    Supports volatile: "true"
+    Using metacopy: "false"
+```
+
+### Binding to Privileged Ports:
+
+Add `sysctl net.ipv4.ip_unprivileged_port_start=23` to the `pre_start_hook` inside the config to lower the range of privileged ports. This will still prevent an unprivileged process from impersonating the sshd daemon. Since this lowers the range globally on the TrueNAS host, a better solution would be to specifically add the capability to bind to privileged ports.
+
+## Cockpit Management
+
+Install and enable cockpit:
+
+```bash
+jlmkr exec mypodmanjail bash -c "dnf -y install cockpit cockpit-podman && \
+  systemctl enable --now cockpit.socket && \
+  ip a &&
+  ip route | awk '/default/ { print \$9 }'"
+```
+
+Check the IP address of the jail and access the Cockpit web interface at https://0.0.0.0:9090 where 0.0.0.0 is the IP address you just found using `ip a`.
+
+If you've setup the `rootless` user, you may login with the password you've created earlier. Otherwise you'd have to add an admin user first:
+
+```bash
+jlmkr exec podmantest bash -c 'adduser admin
+passwd admin
+usermod -aG wheel admin'
+```
+
+Click on `Podman containers`. In case it shows `Podman service is not active` then click `Start podman`. You can now manage your (rootless) podman containers in the (rootless) jailmaker jail using the Cockpit web GUI.
+
+## Additional Resources:
+
+Resources mentioning `add_key keyctl bpf`
+- https://bbs.archlinux.org/viewtopic.php?id=252840
+- https://wiki.archlinux.org/title/systemd-nspawn
+- https://discourse.nixos.org/t/podman-docker-in-nixos-container-ideally-in-unprivileged-one/22909/12
+Resources mentioning `@keyring`
+- https://github.com/systemd/systemd/issues/17606
+- https://github.com/systemd/systemd/blob/1c62c4fe0b54fb419b875cb2bae82a261518a745/src/shared/seccomp-util.c#L604
+`@keyring` also includes `request_key` but doesn't include `bpf`
--- a/templates/podman/config
+++ b/templates/podman/config
@ -0,0 +1,54 @@
+startup=0
+gpu_passthrough_intel=0
+gpu_passthrough_nvidia=0
+
+# Use macvlan networking to provide an isolated network namespace,
+# so podman can manage firewall rules
+# Alternatively use --network-bridge=br1 instead of --network-macvlan
+# Ensure to change eno1/br1 to the interface name you want to use
+# You may want to add additional options here, e.g. bind mounts
+systemd_nspawn_user_args=--network-macvlan=eno1
+    --resolv-conf=bind-host
+    --system-call-filter='add_key keyctl bpf'
+
+# Script to run on the HOST before starting the jail
+# Load kernel module and config kernel settings required for podman
+pre_start_hook=#!/usr/bin/bash
+    set -euo pipefail
+    echo 'PRE_START_HOOK'
+    echo 1 > /proc/sys/net/ipv4/ip_forward
+    modprobe br_netfilter
+    echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
+    echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables
+
+# Only used while creating the jail
+distro=fedora
+release=39
+
+# Install podman inside the jail
+# NOTE: this script will run in the host networking namespace and ignores
+# all systemd_nspawn_user_args such as bind mounts
+initial_setup=#!/usr/bin/bash
+    set -euo pipefail
+    dnf -y install podman
+    # Add the required capabilities to the `newuidmap` and `newgidmap` binaries
+    # https://github.com/containers/podman/issues/2788#issuecomment-1016301663
+    # https://github.com/containers/podman/issues/12637#issuecomment-996524341
+    setcap cap_setuid+eip /usr/bin/newuidmap
+    setcap cap_setgid+eip /usr/bin/newgidmap
+
+# You generally will not need to change the options below
+systemd_run_default_args=--property=KillMode=mixed
+    --property=Type=notify
+    --property=RestartForceExitStatus=133
+    --property=SuccessExitStatus=133
+    --property=Delegate=yes
+    --property=TasksMax=infinity
+    --collect
+    --setenv=SYSTEMD_NSPAWN_LOCK=0
+
+systemd_nspawn_default_args=--keep-unit
+    --quiet
+    --boot
+    --bind-ro=/sys/module
+    --inaccessible=/sys/module/apparmor