Create from config template with initial_setup

2024-02-11 18:30:47 +01:00 · 2024-02-11 18:30:47 +01:00 · 8571caa431
parent f16a91f81d
commit 8571caa431
10 changed files with 656 additions and 260 deletions
--- a/README.md
+++ b/README.md
@ -45,7 +45,13 @@ Creating a jail is interactive. You'll be presented with questions which guide y
 jlmkr create myjail
 ```

-After answering a few questions you should have your first jail up and running!
+After answering some questions you should have your first jail up and running!
+
+You may also specify a path to a config template, for a quick and consistent jail creation process.
+
+```shell
+jlmkr create myjail /path/to/config/template
+```

 ### Startup Jails on Boot

--- a/docs/rootless_podman_in_rootless_jail.md
+++ b/docs/rootless_podman_in_rootless_jail.md
@ -1,120 +0,0 @@
-# Rootless podman in rootless Fedora jail
-
-## Disclaimer
-
-**These notes are a work in progress. Using podman in this setup hasn't been extensively tested.**
-
-## Installation
-
-Prerequisites. Installed jailmaker and setup bridge networking.
-
-Run `jlmkr create rootless` to create a new jail. During jail creation choose fedora 39. This way we get the most recent version of podman available. Don't enable docker compatibility, we're going to enable only the required options manually.
-
-Add `--network-bridge=br1 --resolv-conf=bind-host --system-call-filter='add_key keyctl bpf' --private-users=524288:65536 --private-users-ownership=chown` when asked for additional systemd-nspawn flags during jail creation.
-
-We start at UID 524288, as this is the [systemd range used for containers](https://github.com/systemd/systemd/blob/main/docs/UIDS-GIDS.md#summary).
-
-The `--private-users-ownership=chown` option will ensure the rootfs ownership is corrected.
-
-After the jail has started run `jlmkr stop rootless && jlmkr edit rootless`, remove `--private-users-ownership=chown` and increase the UID range to `131072` to double the number of UIDs available in the jail. We need more than 65536 UIDs available in the jail, since rootless podman also needs to be able to map UIDs. If I leave the `--private-users-ownership=chown` option I get the following error:
-
-> systemd-nspawn[678877]: Automatic UID/GID adjusting is only supported for UID/GID ranges starting at multiples of 2^16 with a range of 2^16
-
-The flags look like this now:
-
-```
-systemd_nspawn_user_args=--network-bridge=br1 --resolv-conf=bind-host --system-call-filter='add_key keyctl bpf' --private-users=524288:131072
-```
-
-Start the jail with `jlmkr start rootless` and open a shell session inside the jail (as the remapped root user) with `jlmkr shell rootless`.
-
-Then inside the jail start the network services (wait to get IP address via DHCP) and install podman:
-```bash
-# systemd-networkd should already be enabled when using jlmkr.py from the develop branch
-systemctl --now enable systemd-networkd
-
-# Add the required capabilities to the `newuidmap` and `newgidmap` binaries.
-# https://github.com/containers/podman/issues/2788#issuecomment-1016301663
-# https://github.com/containers/podman/issues/2788#issuecomment-479972943
-# https://github.com/containers/podman/issues/12637#issuecomment-996524341
-setcap cap_setuid+eip /usr/bin/newuidmap
-setcap cap_setgid+eip /usr/bin/newgidmap
-
-# Create new user
-adduser rootless
-# Set password for user
-passwd rootless
-
-# Clear the subuids and subgids which have been assigned by default when creating the new user
-usermod --del-subuids 0-4294967295 --del-subgids 0-4294967295 rootless
-# Set a specific range, so it fits inside the number of available UIDs
-usermod --add-subuids 65536-131071 --add-subgids 65536-131071 rootless
-# Check the assigned range
-cat /etc/subuid
-# Check the available range
-cat /proc/self/uid_map
-
-dnf -y install podman
-exit
-```
-
-From the TrueNAS host, open a shell as the rootless user inside the jail.
-
-```bash
-jlmkr shell --uid 1000 rootless
-```
-
-Run rootless podman as user 1000.
-
-```bash
-id
-podman run hello-world
-podman info
-```
-
-The output of podman info should contain:
-
-```
-  graphDriverName: overlay
-  graphOptions: {}
-  graphRoot: /home/rootless/.local/share/containers/storage
-  [...]
-  graphStatus:
-    Backing Filesystem: zfs
-    Native Overlay Diff: "true"
-    Supports d_type: "true"
-    Supports shifting: "false"
-    Supports volatile: "true"
-    Using metacopy: "false"
-```
-
-## Cockpit management
-
-Inside the rootless jail run (as root user):
-
-```bash
-dnf install cockpit cockpit-podman
-systemctl enable --now cockpit.socket
-ip a
-```
-
-Check the IP address of the jail and access the Cockpit web interface at https://0.0.0.0:9090 where 0.0.0.0 is the IP address you just found using `ip a`.
-
-Then login as user `rootless` with the password you've created earlier. Click on `Podman containers`. In case it shows `Podman service is not active` then click `Start podman`. You can now manage your rootless podman containers in the rootless jailmaker jail using the Cockpit web GUI.
-
-## TODO:
-On truenas host do:
-sudo sysctl net.ipv4.ip_unprivileged_port_start=23
-> Which would prevent a process by your user impersonating the sshd daemon.
-Actually make it persistent.
-
-## Additional resources:
-
-Resources mentioning `add_key keyctl bpf`
- https://bbs.archlinux.org/viewtopic.php?id=252840
- https://wiki.archlinux.org/title/systemd-nspawn
- https://discourse.nixos.org/t/podman-docker-in-nixos-container-ideally-in-unprivileged-one/22909/12
-Resources mentioning `@keyring`
- https://github.com/systemd/systemd/issues/17606
- https://github.com/systemd/systemd/blob/1c62c4fe0b54fb419b875cb2bae82a261518a745/src/shared/seccomp-util.c#L604
-`@keyring` also includes `request_key` but doesn't include `bpf`
--- a/jlmkr.py
+++ b/jlmkr.py
@ -17,6 +17,7 @@ import shutil
 import stat
 import subprocess
 import sys
+import tempfile
 import time
 import urllib.request
 from collections import defaultdict
@ -273,19 +274,21 @@ def stop_jail(jail_name):
    return subprocess.run(["machinectl", "poweroff", jail_name]).returncode


-def parse_config(jail_config_path):
+def parse_config_string(config_string):
    config = configparser.ConfigParser()
-    try:
    # Workaround to read config file without section headers
-        config.read_string("[DEFAULT]\n" + Path(jail_config_path).read_text())
+    config.read_string("[DEFAULT]\n" + config_string)
+    config = dict(config["DEFAULT"])
+    return config
+
+
+def parse_config_file(jail_config_path):
+    try:
+        return parse_config_string(Path(jail_config_path).read_text())
    except FileNotFoundError:
        eprint(f"Unable to find config file: {jail_config_path}.")
        return

-    config = dict(config["DEFAULT"])
-
-    return config
-

 def add_hook(jail_path, systemd_run_additional_args, hook_command, hook_type):
    if not hook_command:
@ -321,13 +324,53 @@ def start_jail(jail_name):

    jail_path = get_jail_path(jail_name)
    jail_config_path = get_jail_config_path(jail_name)
+    jail_rootfs_path = get_jail_rootfs_path(jail_name)

-    config = parse_config(jail_config_path)
+    config = parse_config_file(jail_config_path)

    if not config:
        eprint("Aborting...")
        return 1

+    # Handle initial setup
+    initial_setup = config.get("initial_setup")
+    
+    # Alternative method to setup on first boot:
+    # https://www.undrground.org/2021/01/25/adding-a-single-run-task-via-systemd/
+    # If there's no machine-id, then this the first time the jail is started
+    if initial_setup and not os.path.exists(
+        os.path.join(jail_rootfs_path, "etc/machine-id")
+    ):
+        # Run the command directly if it doesn't start with a shebang
+        if initial_setup.startswith("#!"):
+            # Write a script file and call that
+            initial_setup_file = os.path.abspath(
+                os.path.join(jail_path, ".initial_setup")
+            )
+            print(initial_setup, file=open(initial_setup_file, "w"))
+            stat_chmod(initial_setup_file, 0o700)
+            cmd = [
+                "systemd-nspawn",
+                "-q",
+                "-D",
+                jail_rootfs_path,
+                f"--bind-ro={initial_setup_file}:/root/initial_startup",
+                "/root/initial_startup",
+            ]
+        else:
+            cmd = ["systemd-nspawn", "-q", "-D", jail_rootfs_path, initial_setup]
+
+        returncode = subprocess.run(cmd).returncode
+        if returncode != 0:
+            eprint("Failed to run initial setup:")
+            eprint(initial_setup)
+            eprint()
+            eprint("Abort starting jail.")
+            return returncode
+        
+        # Cleanup the initial_setup_file
+        Path(initial_setup_file).unlink(missing_ok=True)
+
    systemd_run_additional_args = [
        f"--unit={SYMLINK_NAME}-{jail_name}",
        f"--working-directory=./{jail_path}",
@ -665,11 +708,22 @@ def check_jail_name_available(jail_name, warn=True):
    return False


-def create_jail(jail_name, distro="debian", release="bookworm"):
+def ask_jail_name(jail_name=""):
+    while True:
+        print()
+        jail_name = input_with_default("Enter jail name: ", jail_name).strip()
+        if check_jail_name_valid(jail_name):
+            if check_jail_name_available(jail_name):
+                return jail_name
+
+
+def create_jail(jail_name="", config_path=None, distro="debian", release="bookworm"):
    """
    Create jail with given name.
    """

+    config_string = ""
+
    print(DISCLAIMER)

    if os.path.basename(os.getcwd()) != "jailmaker":
@ -705,6 +759,49 @@ def create_jail(jail_name, distro="debian", release="bookworm"):
    os.makedirs(JAILS_DIR_PATH, exist_ok=True)
    stat_chmod(JAILS_DIR_PATH, 0o700)

+    #################
+    # Config handling
+    #################
+
+    if config_path:
+        try:
+            config_string = Path(config_path).read_text()
+        except FileNotFoundError:
+            eprint(f"Unable to find file: {config_path}.")
+            return 1
+    else:
+        print()
+        if agree("Do you wish to create a jail from a config template?", "n"):
+            print(
+                dedent(
+                    """
+                A text editor will open so you can provide the config template.
+
+                - please copy your config
+                - paste it into the text editor      
+                - save and close the text editor
+            """
+                )
+            )
+            input("Press Enter to open the text editor.")
+
+            with tempfile.NamedTemporaryFile() as f:
+                subprocess.call([TEXT_EDITOR, f.name])
+                f.seek(0)
+                config_string = f.read().decode()
+
+    if config_string:
+        config = parse_config_string(config_string)
+        # Ask for jail name if not provided
+        if not (
+            jail_name
+            and check_jail_name_valid(jail_name)
+            and check_jail_name_available(jail_name)
+        ):
+            jail_name = ask_jail_name(jail_name)
+        jail_path = get_jail_path(jail_name)
+        distro, release = config.get("initial_rootfs_image").split()
+    else:
        print()
        if not agree(f"Install the recommended image ({distro} {release})?", "y"):
            print(
@ -745,18 +842,9 @@ def create_jail(jail_name, distro="debian", release="bookworm"):

            release = input("Release: ")

-    while True:
-        print()
-        jail_name = input_with_default("Enter jail name: ", jail_name).strip()
-        if check_jail_name_valid(jail_name):
-            if check_jail_name_available(jail_name):
-                break
-
+        jail_name = ask_jail_name(jail_name)
        jail_path = get_jail_path(jail_name)

-    # Cleanup in except, but only once the jail_path is final
-    # Otherwise we may cleanup the wrong directory
-    try:
        print(
            dedent(
                f"""
@ -875,8 +963,106 @@ def create_jail(jail_name, distro="debian", release="bookworm"):
            )
        )

+        # Use mostly default settings for systemd-nspawn but with systemd-run instead of a service file:
+        # https://github.com/systemd/systemd/blob/main/units/systemd-nspawn%40.service.in
+        # Use TasksMax=infinity since this is what docker does:
+        # https://github.com/docker/engine/blob/master/contrib/init/systemd/docker.service
+
+        # Use SYSTEMD_NSPAWN_LOCK=0: otherwise jail won't start jail after a shutdown (but why?)
+        # Would give "directory tree currently busy" error and I'd have to run
+        # `rm /run/systemd/nspawn/locks/*` and remove the .lck file from jail_path
+        # Disabling locking isn't a big deal as systemd-nspawn will prevent starting a container
+        # with the same name anyway: as long as we're starting jails using this script,
+        # it won't be possible to start the same jail twice
+
+        systemd_run_default_args = [
+            "--property=KillMode=mixed",
+            "--property=Type=notify",
+            "--property=RestartForceExitStatus=133",
+            "--property=SuccessExitStatus=133",
+            "--property=Delegate=yes",
+            "--property=TasksMax=infinity",
+            "--collect",
+            "--setenv=SYSTEMD_NSPAWN_LOCK=0",
+        ]
+
+        # Always add --bind-ro=/sys/module to make lsmod happy
+        # https://manpages.debian.org/bookworm/manpages/sysfs.5.en.html
+        systemd_nspawn_default_args = [
+            "--keep-unit",
+            "--quiet",
+            "--boot",
+            "--bind-ro=/sys/module",
+            "--inaccessible=/sys/module/apparmor",
+        ]
+
+        systemd_nspawn_user_args_multiline = "\n\t".join(
+            shlex.split(systemd_nspawn_user_args)
+        )
+        systemd_run_default_args_multiline = "\n\t".join(systemd_run_default_args)
+        systemd_nspawn_default_args_multiline = "\n\t".join(systemd_nspawn_default_args)
+
+        config_string = cleandoc(
+            f"""
+            startup={startup}
+            docker_compatible={docker_compatible}
+            gpu_passthrough_intel={gpu_passthrough_intel}
+            gpu_passthrough_nvidia={gpu_passthrough_nvidia}   
+        """
+        )
+
+        config_string += (
+            f"\n\nsystemd_nspawn_user_args={systemd_nspawn_user_args_multiline}\n\n"
+        )
+
+        config_string += cleandoc(
+            """
+            # # Specify command/script to run on the HOST before starting the jail
+            # # For example to load kernel modules and config kernel settings
+            # pre_start_hook=#!/usr/bin/bash
+            #     echo 'PRE_START_HOOK'
+            #     echo 1 > /proc/sys/net/ipv4/ip_forward
+            #     modprobe br_netfilter
+            #     echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
+            #     echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables
+            # 
+            # # Specify a command/script to run on the HOST after stopping the jail
+            # post_stop_hook=echo 'POST_STOP_HOOK'
+            
+            # Specify command/script to run IN THE JAIL before starting it for the first time
+            # Useful to install packages on top of the base rootfs
+            # NOTE: this script will run in the host networking namespace and ignores
+            # all systemd_nspawn_user_args such as bind mounts
+            initial_setup=#!/usr/bin/bash
+                set -euo pipefail
+                apt-get update && apt-get -y install curl
+                curl -fsSL https://get.docker.com | sh
+        """
+        )
+
+        config_string += "\n".join(
+            [
+                "",
+                "",
+                "# You generally will not need to change the options below",
+                f"systemd_run_default_args={systemd_run_default_args_multiline}",
+                "",
+                f"systemd_nspawn_default_args={systemd_nspawn_default_args_multiline}",
+                "",
+                "# Used by jlmkr create",
+                f"initial_rootfs_image={distro} {release}",
+            ]
+        )
+
        print()

+    ##############
+    # Create start
+    ##############
+
+    # Cleanup in except, but only once the jail_path is final
+    # Otherwise we may cleanup the wrong directory
+    try:
        jail_config_path = get_jail_config_path(jail_name)
        jail_rootfs_path = get_jail_rootfs_path(jail_name)

@ -1014,84 +1200,7 @@ def create_jail(jail_name, distro="debian", release="bookworm"):
                file=open(os.path.join(preset_path, "00-jailmaker.preset"), "w"),
            )

-        # Use mostly default settings for systemd-nspawn but with systemd-run instead of a service file:
-        # https://github.com/systemd/systemd/blob/main/units/systemd-nspawn%40.service.in
-        # Use TasksMax=infinity since this is what docker does:
-        # https://github.com/docker/engine/blob/master/contrib/init/systemd/docker.service
-
-        # Use SYSTEMD_NSPAWN_LOCK=0: otherwise jail won't start jail after a shutdown (but why?)
-        # Would give "directory tree currently busy" error and I'd have to run
-        # `rm /run/systemd/nspawn/locks/*` and remove the .lck file from jail_path
-        # Disabling locking isn't a big deal as systemd-nspawn will prevent starting a container
-        # with the same name anyway: as long as we're starting jails using this script,
-        # it won't be possible to start the same jail twice
-
-        systemd_run_default_args = [
-            "--property=KillMode=mixed",
-            "--property=Type=notify",
-            "--property=RestartForceExitStatus=133",
-            "--property=SuccessExitStatus=133",
-            "--property=Delegate=yes",
-            "--property=TasksMax=infinity",
-            "--collect",
-            "--setenv=SYSTEMD_NSPAWN_LOCK=0",
-        ]
-
-        # Always add --bind-ro=/sys/module to make lsmod happy
-        # https://manpages.debian.org/bookworm/manpages/sysfs.5.en.html
-        systemd_nspawn_default_args = [
-            "--keep-unit",
-            "--quiet",
-            "--boot",
-            "--bind-ro=/sys/module",
-            "--inaccessible=/sys/module/apparmor",
-        ]
-
-        systemd_nspawn_user_args_multiline = "\n\t".join(
-            shlex.split(systemd_nspawn_user_args)
-        )
-        systemd_run_default_args_multiline = "\n\t".join(systemd_run_default_args)
-        systemd_nspawn_default_args_multiline = "\n\t".join(systemd_nspawn_default_args)
-
-        config = cleandoc(
-            f"""
-            startup={startup}
-            docker_compatible={docker_compatible}
-            gpu_passthrough_intel={gpu_passthrough_intel}
-            gpu_passthrough_nvidia={gpu_passthrough_nvidia}   
-        """
-        )
-
-        config += (
-            f"\n\nsystemd_nspawn_user_args={systemd_nspawn_user_args_multiline}\n\n"
-        )
-
-        config += cleandoc(
-            """
-            # Specify command/script to run on the HOST before starting the jail
-            pre_start_hook=echo 'PRE_START_HOOK'
-            
-            # Specify a command/script to run on the HOST after stopping the jail
-            post_stop_hook=#!/usr/bin/bash
-                echo 'POST STOP HOOK'
-        """
-        )
-
-        config += "\n".join(
-            [
-                "",
-                "",
-                "# You generally will not need to change the options below",
-                f"systemd_run_default_args={systemd_run_default_args_multiline}",
-                "",
-                f"systemd_nspawn_default_args={systemd_nspawn_default_args_multiline}",
-                "",
-                "# The below is for reference only, currently not used",
-                f"initial_rootfs_image={distro} {release}",
-            ]
-        )
-
-        print(config, file=open(jail_config_path, "w"))
+        print(config_string, file=open(jail_config_path, "w"))

        os.chmod(jail_config_path, 0o600)

@ -1100,6 +1209,13 @@ def create_jail(jail_name, distro="debian", release="bookworm"):
        cleanup(jail_path)
        raise error

+    # In case you want to create a jail without any user interaction,
+    # you need to skip this final question
+    # echo 'y' | jlmkr create test testconfig
+    # TODO: make jlmkr create work cleanly without user interaction.
+    # Current echo 'y' workaround may cause problems when the jail name already exists
+    # You'd end up with a new jail called 'y'
+    # and the script will crash at the agree statement below
    print()
    if agree(f"Do you want to start jail {jail_name} right now?", "y"):
        return start_jail(jail_name)
@ -1286,7 +1402,7 @@ def list_jails():
    # TODO: add additional properties from the jails config file

    for jail_name in jails:
-        config = parse_config(get_jail_config_path(jail_name))
+        config = parse_config_file(get_jail_config_path(jail_name))

        startup = False
        if config:
@ -1434,7 +1550,7 @@ def startup_jails():

    start_failure = False
    for jail_name in get_all_jail_names():
-        config = parse_config(get_jail_config_path(jail_name))
+        config = parse_config_file(get_jail_config_path(jail_name))
        if config and config.get("startup") == "1":
            if start_jail(jail_name) != 0:
                start_failure = True
@ -1463,9 +1579,11 @@ def main():
        help="install jailmaker dependencies and create symlink",
    )

-    subparsers.add_parser(
+    create_parser = subparsers.add_parser(
        name="create", epilog=DISCLAIMER, help="create a new jail"
-    ).add_argument("name", nargs="?", help="name of the jail")
+    )
+    create_parser.add_argument("name", nargs="?", help="name of the jail")
+    create_parser.add_argument("config", nargs="?", help="path to config file template")

    subparsers.add_parser(
        name="start", epilog=DISCLAIMER, help="start a previously created jail"
@ -1539,7 +1657,7 @@ def main():
        sys.exit(install_jailmaker())

    elif args.subcommand == "create":
-        sys.exit(create_jail(args.name))
+        sys.exit(create_jail(args.name, args.config))

    elif args.subcommand == "start":
        sys.exit(start_jail(args.name))
@ -1580,7 +1698,7 @@ def main():
    else:
        if agree("Create a new jail?", "y"):
            print()
-            sys.exit(create_jail(""))
+            sys.exit(create_jail())
        else:
            parser.print_usage()

--- a/templates/docker/README.md
+++ b/templates/docker/README.md
@ -0,0 +1,3 @@
+# Debian Docker Jail Template
+
+Check out the [config](./config) template file. You may provide it when asked during `jlmkr create` or, if you have the template file stored on your NAS, you may provide it directly by running `jlmkr create mydockerjail /mnt/tank/path/to/docker/config`. 
--- a/templates/docker/config
+++ b/templates/docker/config
@ -0,0 +1,59 @@
+startup=0
+gpu_passthrough_intel=0
+gpu_passthrough_nvidia=0   
+
+# Use macvlan networking to provide an isolated network namespace,
+# so docker can manage firewall rules
+# Alternatively use --network-bridge=br1 instead of --network-macvlan
+# Ensure to change eno1/br1 to the interface name you want to use
+# You may want to add additional options here, e.g. bind mounts
+systemd_nspawn_user_args=--network-macvlan=eno1
+    --resolv-conf=bind-host
+    --system-call-filter='add_key keyctl bpf'
+
+# Script to run on the HOST before starting the jail
+# Load kernel module and config kernel settings required for docker
+pre_start_hook=#!/usr/bin/bash
+    echo 'PRE_START_HOOK'
+    echo 1 > /proc/sys/net/ipv4/ip_forward
+    modprobe br_netfilter
+    echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
+    echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables
+
+# Install docker inside the jail:
+# https://docs.docker.com/engine/install/debian/#install-using-the-repository
+# NOTE: this script will run in the host networking namespace and ignores
+# all systemd_nspawn_user_args such as bind mounts
+initial_setup=#!/usr/bin/bash
+	set -euo pipefail
+
+    apt-get update && apt-get -y install ca-certificates curl
+    install -m 0755 -d /etc/apt/keyrings
+    curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
+    chmod a+r /etc/apt/keyrings/docker.asc
+
+    echo \
+    "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
+    $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
+    tee /etc/apt/sources.list.d/docker.list > /dev/null
+    apt-get update
+    apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
+
+# You generally will not need to change the options below
+systemd_run_default_args=--property=KillMode=mixed
+	--property=Type=notify
+	--property=RestartForceExitStatus=133
+	--property=SuccessExitStatus=133
+	--property=Delegate=yes
+	--property=TasksMax=infinity
+	--collect
+	--setenv=SYSTEMD_NSPAWN_LOCK=0
+
+systemd_nspawn_default_args=--keep-unit
+	--quiet
+	--boot
+	--bind-ro=/sys/module
+	--inaccessible=/sys/module/apparmor
+
+# Used by jlmkr create
+initial_rootfs_image=debian bookworm
--- a/templates/incus/README.md
+++ b/templates/incus/README.md
--- a/templates/lxd/README.md
+++ b/templates/lxd/README.md
@ -0,0 +1,101 @@
+# Ubuntu LXD Jail Template
+
+Check out the [config](./config) template file. You may provide it when asked during `jlmkr create` or, if you have the template file stored on your NAS, you may provide it directly by running `jlmkr create mylxdjail /mnt/tank/path/to/lxd/config`.
+
+Unfortunately snapd doesn't want to install from the `initial_setup` script inside the config file. So we manually finish the setup by running the following after creating and starting the jail:
+
+```bash
+jlmkr exec mylxdjail bash -c 'apt-get update &&
+    apt-get install -y --no-install-recommends snapd &&
+    snap install lxd'
+
+# Answer yes when asked the following:
+# Would you like the LXD server to be available over the network? (yes/no) [default=no]: yes
+# TODO: fix ZFS
+jlmkr exec mylxdjail bash -c 'lxd init &&
+    snap set lxd ui.enable=true &&
+    systemctl reload snap.lxd.daemon'
+```
+
+Then visit the `lxd` GUI inside the browser https://0.0.0.0:8443. To find out which IP address to use instead of 0.0.0.0, check the IP address for your jail with `jlmkr list`.
+
+## Disclaimer
+
+**These notes are a work in progress. Using Incus in this setup hasn't been extensively tested.**
+
+## Installation
+
+Create a debian 12 jail and [install incus](https://github.com/zabbly/incus#installation). Also install the `incus-ui-canonical` package to install the web interface. Ensure the config file looks like the below:
+
+Run `modprobe vhost_vsock` on the TrueNAS host.
+
+```
+startup=0
+docker_compatible=1
+gpu_passthrough_intel=1
+gpu_passthrough_nvidia=0
+systemd_nspawn_user_args=--network-bridge=br1 --resolv-conf=bind-host --bind=/dev/fuse --bind=/dev/kvm --bind=/dev/vsock --bind=/dev/vhost-vsock
+# You generally will not need to change the options below
+systemd_run_default_args=--property=KillMode=mixed --property=Type=notify --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=Delegate=yes --property=TasksMax=infinity --collect --setenv=SYSTEMD_NSPAWN_LOCK=0
+systemd_nspawn_default_args=--keep-unit --quiet --boot --bind-ro=/sys/module --inaccessible=/sys/module/apparmor
+```
+
+Check out [First steps with Incus](https://linuxcontainers.org/incus/docs/main/tutorial/first_steps/).
+
+## Create Ubuntu Desktop VM
+
+Incus web GUI should be running on port 8443. Create new instance, call it `dekstop`, and choose the `Ubuntu	jammy desktop virtual-machine ubuntu/22.04/desktop` image.
+
+## Bind mount / virtiofs
+
+To access files from the TrueNAS host directly in a VM created with incus, we can use virtiofs.
+
+```bash
+incus config device add desktop test disk source=/home/test/ path=/mnt/test
+```
+
+The command above (when ran as root user inside the incus jail) adds a new virtiofs mount of a test directory inside the jail to a VM named desktop. The `/home/test` dir resides in the jail, but you can first bind mount any directory from the TrueNAS host inside the incus jail and then forward this to the VM using virtiofs. This could be an alternative to NFS mounts.
+
+### Benchmarks
+
+#### Inside LXD ubuntu desktop VM with virtiofs mount
+root@desktop:/mnt/test# mount | grep test
+incus_test on /mnt/test type virtiofs (rw,relatime)
+root@desktop:/mnt/test# time iozone -a
+[...]
+real    2m22.389s
+user    0m2.222s
+sys     0m59.275s
+
+#### In a jailmaker jail on the host:
+root@incus:/home/test# time iozone -a
+[...]
+real	0m59.486s
+user	0m1.468s
+sys	0m25.458s
+
+#### Inside LXD ubuntu desktop VM with virtiofs mount
+root@desktop:/mnt/test# dd if=/dev/random of=./test1.img bs=1G count=1 oflag=dsync
+1+0 records in
+1+0 records out
+1073741824 bytes (1.1 GB, 1.0 GiB) copied, 36.321 s, 29.6 MB/s
+
+#### In a jailmaker jail on the host:
+root@incus:/home/test# dd if=/dev/random of=./test2.img bs=1G count=1 oflag=dsync
+1+0 records in
+1+0 records out
+1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.03723 s, 153 MB/s
+
+## Create Ubuntu container
+
+To be able to create unprivileged (rootless) containers with incus inside the jail, you need to increase the amount of UIDs available inside the jail. Please refer to the [Podman instructions](../podman/README.md) for more information. If you don't increase the UIDs you can only create privileged containers. You'd have to change `Privileged` to `Allow` in `Security policies` in this case.
+
+## Canonical LXD install via snap
+
+Installing the lxd snap is an alternative to Incus. But out of the box running `snap install lxd` will cause AppArmor issues when running inside a jailmaker jail on SCALE.
+
+
+
+## References
+
+- [Running QEMU/KVM Virtual Machines in Unprivileged LXD Containers](https://dshcherb.github.io/2017/12/04/qemu-kvm-virtual-machines-in-unprivileged-lxd.html)
--- a/templates/lxd/config
+++ b/templates/lxd/config
@ -0,0 +1,55 @@
+startup=0
+gpu_passthrough_intel=1
+gpu_passthrough_nvidia=0   
+
+# Use macvlan networking to provide an isolated network namespace,
+# so lxd can manage firewall rules
+# Alternatively use --network-bridge=br1 instead of --network-macvlan
+# Ensure to change eno1/br1 to the interface name you want to use
+# You may want to add additional options here, e.g. bind mounts
+# TODO: don't use --capability=all but specify only the required capabilities
+systemd_nspawn_user_args=--network-macvlan=eno1
+    --resolv-conf=bind-host
+    --capability=all
+    --bind=/dev/fuse
+    --bind=/dev/kvm
+    --bind=/dev/vsock
+    --bind=/dev/vhost-vsock
+
+# Script to run on the HOST before starting the jail
+# Load kernel module and config kernel settings required for lxd
+pre_start_hook=#!/usr/bin/bash
+    echo 'PRE_START_HOOK'
+    echo 1 > /proc/sys/net/ipv4/ip_forward
+    modprobe br_netfilter
+    echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
+    echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables
+    modprobe vhost_vsock
+
+# NOTE: this script will run in the host networking namespace and ignores
+# all systemd_nspawn_user_args such as bind mounts
+initial_setup=#!/usr/bin/bash
+    # https://discuss.linuxcontainers.org/t/snap-inside-privileged-lxd-container/13691/8
+    ln -s /bin/true /usr/local/bin/udevadm
+
+# You generally will not need to change the options below
+systemd_run_default_args=--property=KillMode=mixed
+	--property=Type=notify
+	--property=RestartForceExitStatus=133
+	--property=SuccessExitStatus=133
+	--property=Delegate=yes
+	--property=TasksMax=infinity
+	--collect
+	--setenv=SYSTEMD_NSPAWN_LOCK=0
+# TODO: check if the below 2 are required
+# --setenv=SYSTEMD_SECCOMP=0
+# --property=DevicePolicy=auto
+
+systemd_nspawn_default_args=--keep-unit
+	--quiet
+	--boot
+	--bind-ro=/sys/module
+	--inaccessible=/sys/module/apparmor
+
+# Used by jlmkr create
+initial_rootfs_image=ubuntu jammy
--- a/templates/podman/README.md
+++ b/templates/podman/README.md
@ -0,0 +1,121 @@
+# Fedora Podman Jail Template
+
+Check out the [config](./config) template file. You may provide it when asked during `jlmkr create` or, if you have the template file stored on your NAS, you may provide it directly by running `jlmkr create mypodmanjail /mnt/tank/path/to/podman/config`.
+
+## Rootless
+
+### Disclaimer
+
+**These notes are a work in progress. Using podman in this setup hasn't been extensively tested.**
+
+### Installation
+
+Prerequisites: created a jail using the [config](./config) template file.
+
+Run `jlmkr edit mypodmanjail` and add `--private-users=524288:65536 --private-users-ownership=chown` to `systemd_nspawn_user_args`. We start at UID 524288, as this is the [systemd range used for containers](https://github.com/systemd/systemd/blob/main/docs/UIDS-GIDS.md#summary).
+
+The `--private-users-ownership=chown` option will ensure the rootfs ownership is corrected.
+
+After the jail has started run `jlmkr stop mypodmanjail && jlmkr edit mypodmanjail`, remove `--private-users-ownership=chown` and increase the UID range to `131072` to double the number of UIDs available in the jail. We need more than 65536 UIDs available in the jail, since rootless podman also needs to be able to map UIDs. If I leave the `--private-users-ownership=chown` option I get the following error:
+
+> systemd-nspawn[678877]: Automatic UID/GID adjusting is only supported for UID/GID ranges starting at multiples of 2^16 with a range of 2^16
+
+The flags look like this now:
+
+```
+systemd_nspawn_user_args=--network-macvlan=eno1
+    --resolv-conf=bind-host
+    --system-call-filter='add_key keyctl bpf'
+    --private-users=524288:131072
+```
+
+Start the jail with `jlmkr start mypodmanjail` and open a shell session inside the jail (as the remapped root user) with `jlmkr shell mypodmanjail`.
+
+Then inside the jail setup the new rootless user:
+
+```bash
+# Create new user
+adduser rootless
+# Set password for user
+passwd rootless
+
+# Clear the subuids and subgids which have been assigned by default when creating the new user
+usermod --del-subuids 0-4294967295 --del-subgids 0-4294967295 rootless
+# Set a specific range, so it fits inside the number of available UIDs
+usermod --add-subuids 65536-131071 --add-subgids 65536-131071 rootless
+
+# Check the assigned range
+cat /etc/subuid
+# Check the available range
+cat /proc/self/uid_map
+
+exit
+```
+
+From the TrueNAS host, open a shell as the rootless user inside the jail.
+
+```bash
+jlmkr shell --uid 1000 mypodmanjail
+```
+
+Run rootless podman as user 1000.
+
+```bash
+id
+podman run hello-world
+podman info
+```
+
+The output of podman info should contain:
+
+```
+  graphDriverName: overlay
+  graphOptions: {}
+  graphRoot: /home/rootless/.local/share/containers/storage
+  [...]
+  graphStatus:
+    Backing Filesystem: zfs
+    Native Overlay Diff: "true"
+    Supports d_type: "true"
+    Supports shifting: "false"
+    Supports volatile: "true"
+    Using metacopy: "false"
+```
+
+### Binding to Privileged Ports:
+
+Add `sysctl net.ipv4.ip_unprivileged_port_start=23` to the `pre_start_hook` inside the config to lower the range of privileged ports. This will still prevent an unprivileged process from impersonating the sshd daemon. Since this lowers the range globally on the TrueNAS host, a better solution would be to specifically add the capability to bind to privileged ports.
+
+## Cockpit Management
+
+Install and enable cockpit:
+
+```bash
+jlmkr exec mypodmanjail bash -c "dnf -y install cockpit cockpit-podman && \
+  systemctl enable --now cockpit.socket && \
+  ip a &&
+  ip route | awk '/default/ { print \$9 }'"
+```
+
+Check the IP address of the jail and access the Cockpit web interface at https://0.0.0.0:9090 where 0.0.0.0 is the IP address you just found using `ip a`.
+
+If you've setup the `rootless` user, you may login with the password you've created earlier. Otherwise you'd have to add an admin user first:
+
+```bash
+jlmkr exec podmantest bash -c 'adduser admin
+passwd admin
+usermod -aG wheel admin'
+```
+
+Click on `Podman containers`. In case it shows `Podman service is not active` then click `Start podman`. You can now manage your (rootless) podman containers in the (rootless) jailmaker jail using the Cockpit web GUI.
+
+## Additional Resources:
+
+Resources mentioning `add_key keyctl bpf`
+- https://bbs.archlinux.org/viewtopic.php?id=252840
+- https://wiki.archlinux.org/title/systemd-nspawn
+- https://discourse.nixos.org/t/podman-docker-in-nixos-container-ideally-in-unprivileged-one/22909/12
+Resources mentioning `@keyring`
+- https://github.com/systemd/systemd/issues/17606
+- https://github.com/systemd/systemd/blob/1c62c4fe0b54fb419b875cb2bae82a261518a745/src/shared/seccomp-util.c#L604
+`@keyring` also includes `request_key` but doesn't include `bpf`
--- a/templates/podman/config
+++ b/templates/podman/config
@ -0,0 +1,53 @@
+startup=0
+gpu_passthrough_intel=0
+gpu_passthrough_nvidia=0   
+
+# Use macvlan networking to provide an isolated network namespace,
+# so podman can manage firewall rules
+# Alternatively use --network-bridge=br1 instead of --network-macvlan
+# Ensure to change eno1/br1 to the interface name you want to use
+# You may want to add additional options here, e.g. bind mounts
+systemd_nspawn_user_args=--network-macvlan=eno1
+    --resolv-conf=bind-host
+    --system-call-filter='add_key keyctl bpf'
+
+# Script to run on the HOST before starting the jail
+# Load kernel module and config kernel settings required for podman
+pre_start_hook=#!/usr/bin/bash
+    echo 'PRE_START_HOOK'
+    echo 1 > /proc/sys/net/ipv4/ip_forward
+    modprobe br_netfilter
+    echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
+    echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables
+
+# Install podman inside the jail
+# NOTE: this script will run in the host networking namespace and ignores
+# all systemd_nspawn_user_args such as bind mounts
+
+initial_setup=#!/usr/bin/bash
+	set -euo pipefail
+    dnf -y install podman
+    # Add the required capabilities to the `newuidmap` and `newgidmap` binaries
+    # https://github.com/containers/podman/issues/2788#issuecomment-1016301663
+    # https://github.com/containers/podman/issues/12637#issuecomment-996524341
+    setcap cap_setuid+eip /usr/bin/newuidmap
+    setcap cap_setgid+eip /usr/bin/newgidmap
+
+# You generally will not need to change the options below
+systemd_run_default_args=--property=KillMode=mixed
+	--property=Type=notify
+	--property=RestartForceExitStatus=133
+	--property=SuccessExitStatus=133
+	--property=Delegate=yes
+	--property=TasksMax=infinity
+	--collect
+	--setenv=SYSTEMD_NSPAWN_LOCK=0
+
+systemd_nspawn_default_args=--keep-unit
+	--quiet
+	--boot
+	--bind-ro=/sys/module
+	--inaccessible=/sys/module/apparmor
+
+# Used by jlmkr create
+initial_rootfs_image=fedora 39