From 9bd5be8f737674bb3f5d1e5e20a2c246a15c823d Mon Sep 17 00:00:00 2001 From: "Trux-Bot[bot]" Date: Sat, 20 Apr 2024 11:24:46 +0000 Subject: [PATCH] deploy: f4ad1d51f4fa7d53fe1c27eb294d0b94c0a185b3 --- docs/maintenance/backups/index.html | 2 +- docs/network/dns_dhcp/index.html | 1 + docs/search/search_index.json | 2 +- docs/sitemap.xml | 55 +++++++++++++++------------- docs/sitemap.xml.gz | Bin 445 -> 455 bytes 5 files changed, 33 insertions(+), 27 deletions(-) create mode 100644 docs/network/dns_dhcp/index.html diff --git a/docs/maintenance/backups/index.html b/docs/maintenance/backups/index.html index 6d9e2032..a6b971b8 100644 --- a/docs/maintenance/backups/index.html +++ b/docs/maintenance/backups/index.html @@ -1,4 +1,4 @@ - Backups - Truxnell's NixOS homelab
Skip to content

Backups

Nightly Backups are facilitated by NixOS's module for restic module and a helper module ive written.

This does a nightly ZFS snapshot, in which apps and other mutable data is restic backed up to both a local folder on my NAS and also to Cloudflare R2 ). Backing up from a ZFS snapshot ensures that the restic backup is consistent, as backing up files in use (especially a sqlite database) will cause corruption. Here, all restic jobs are backing up as per the 2.05 snapshot, regardless of when they run that night.

Another benefit of this approach is that it is service agnostic - containers, nixos services, qemu, whatever all have files in the same place on the filesystem (in the persistant folder) so they can all be backed up in the same fashion.

The alternative is to shutdown services during backup (which could be facilitaed with the restic backup pre/post scripts) but ZFS snapshots are a godsend in this area, and im already running them for impermanence.

Backing up without snapshots/shutdowns?

This is a pattern I see a bit too - if you are backing up files raw without stopping your service beforehand you might want to check to ensure your backups aren't corrupted.

The timeline then is:

time activity
02.00 ZFS deletes prior snapshot and creates new one, to rpool/safe/persist@restic_nightly_snap
02.05 - 04.05 Restic backs up from new snapshot's hidden read-only mount .zfs with random delays per-service - to local and remote locations

Automatic Backups

I have added a sops secret for both my local and remote servers in my restic module /nixos/modules/nixos/services/restic/. This provides the restic password and 'AWS' credentials for the S3-compatible R2 bucket.

Backups are created per-service in each services module. This is largely done with a lib helper ive written, which creates both the relevant restic backup local and remote entries in my nixosConfiguration. nixos/modules/nixos/lib.nix

Why not backup the entire persist in one hit?

Possibly a hold over from my k8s days, but its incredibly useful to be able to restore per-service, especially if you just want to move an app around or restore one app. You can always restore multiple repos with a script/taskfile.

NixOS will create a service + timer for each job - below shows the output for a prowlarr local/remote backup.

# Confirming snapshot taken overnight - we can see 2AM
+ Backups - Truxnell's NixOS homelab      

Backups

Nightly Backups are facilitated by NixOS's module for restic module and a helper module ive written.

This does a nightly ZFS snapshot, in which apps and other mutable data is restic backed up to both a local folder on my NAS and also to Cloudflare R2 . Backing up from a ZFS snapshot ensures that the restic backup is consistent, as backing up files in use (especially a sqlite database) will cause corruption. Here, all restic jobs are backing up as per the 2.05 snapshot, regardless of when they run that night.

Another benefit of this approach is that it is service agnostic - containers, nixos services, qemu, whatever all have files in the same place on the filesystem (in the persistant folder) so they can all be backed up in the same fashion.

The alternative is to shutdown services during backup (which could be facilitaed with the restic backup pre/post scripts) but ZFS snapshots are a godsend in this area, and im already running them for impermanence.

Backing up without snapshots/shutdowns?

This is a pattern I see a bit too - if you are backing up files raw without stopping your service beforehand you might want to check to ensure your backups aren't corrupted.

The timeline then is:

time activity
02.00 ZFS deletes prior snapshot and creates new one, to rpool/safe/persist@restic_nightly_snap
02.05 - 04.05 Restic backs up from new snapshot's hidden read-only mount .zfs with random delays per-service - to local and remote locations

Automatic Backups

I have added a sops secret for both my local and remote servers in my restic module /nixos/modules/nixos/services/restic/. This provides the restic password and 'AWS' credentials for the S3-compatible R2 bucket.

Backups are created per-service in each services module. This is largely done with a lib helper ive written, which creates both the relevant restic backup local and remote entries in my nixosConfiguration. nixos/modules/nixos/lib.nix

Why not backup the entire persist in one hit?

Possibly a hold over from my k8s days, but its incredibly useful to be able to restore per-service, especially if you just want to move an app around or restore one app. You can always restore multiple repos with a script/taskfile.

NixOS will create a service + timer for each job - below shows the output for a prowlarr local/remote backup.

# Confirming snapshot taken overnight - we can see 2AM
 truxnell@daedalus ~> systemctl status restic_nightly_snapshot.service
  restic_nightly_snapshot.service - Nightly ZFS snapshot for Restic
      Loaded: loaded (/etc/systemd/system/restic_nightly_snapshot.service; linked; preset: enabled)
diff --git a/docs/network/dns_dhcp/index.html b/docs/network/dns_dhcp/index.html
new file mode 100644
index 00000000..884ee08b
--- /dev/null
+++ b/docs/network/dns_dhcp/index.html
@@ -0,0 +1 @@
+ DNS & DHCP - Truxnell's NixOS homelab      

DNS & DHCP

TLDR

External DNS: Client -> Adguard Home (r->

My DNS has evolved and changed over time, especially with a personal desire to keep my entire internet backbone boring and standard off a trusted vendor. 'Why cant I connect to my Minecraft server' and 'Are you playing with the internet again' are questions I don't want to have to answer in this house.

Sadly, while I do love my Unifi Dream Machine Pro, its DNS opportunity is lackluster and I really prefer split-dns so I don't have to access everything with ip:port.

General

My devices all use the Unifi DHCP server to get addresses, which I much prefer so I maintain all my clients in the single-pane-of-glass the UDMP provides. In the DHCP options, I add the

[CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

\ No newline at end of file diff --git a/docs/search/search_index.json b/docs/search/search_index.json index 7cbbf608..c5b906a2 100644 --- a/docs/search/search_index.json +++ b/docs/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\u200b\\-_,:!=\\[\\]()\"`/]+|\\.(?!\\d)|&[lg]t;|(?!\\b)(?=[A-Z][a-z])","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"readme.md","text":"

\ud83d\udc4b Welcome to my NixoOS home and homelab configuration. This monorepo is my personal nix/nixos setup for all my devices, specifically my homelab.

This is the end result of a recovering k8s addict - who no longer enjoyed the time and effort I personally found it took to run k8s at home.

"},{"location":"#why","title":"Why?","text":"

Having needed a break from hobby's for some health related reasons, I found coming back to a unpatched cluster a chore, which was left unattented. Then a cheap SSD in my custom VyOS router blew, leading me to just put back in my Unifi Dreammachine router, which broke the custom DNS I was running for my cluster, which caused it issues.

During fixing the DNS issue, a basic software upgrade for the custom k8s OS I was running k8s on broke my cluster for the 6th time running, coupled with using a older version of the script tool I used to manage its machine config yaml, which ended up leading to my 6th k8s disaster recovery ).

Looking at my boring Ubuntu ZFS nas which just ran and ran and ran without needing TLC, and remembering the old days with Ubuntu + Docker Compose being hands-off ), I dove into nix, with the idea of getting back to basics of boring proven tools, with the power of nix's declarative system.

"},{"location":"#goals","title":"Goals","text":"

One of my goals is to bring what I learnt running k8s at home with some of the best homelabbers, into the nix world and see just how much of the practices I learnt I can apply to a nix setup, while focussing on having a solid, reliable, setup that I can leave largely unattended for months without issues cropping up.

The goal of this doc is for me to slow down a bit and jot down how and why I am doing what im doing in a module, and cover how I have approached the faucets of homelabbing, so YOU can understand, steal with pride from my code, and hopefully(?) learn a thing or two.

To teach me a thing or two, contact me or raise a Issue. PR's may or may not be taken as a personal attack - this is my home setup after all.

[CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

"},{"location":"motd/","title":"Message of the day","text":"

Why not include a nice message of the day for each server I log into?

The below gives some insight into what the servers running, status of zpools, usage, etc. While not show below - thankfully - If a zpool error is found the status gives a full zpool status -x debrief which is particulary eye-catching upon login.

I've also squeezed in a 'reboot required' flag for when the server had detected its running kernel/init/systemd is a different version to what it booted with - useful to know when long running servers require a reboot to pick up new kernel/etc versions.

Message of the day

Code TLDR

/nixos/modules/nixos/system/motd

Write a shell script using nix with a bash motd of your choosing.

let\n  motd = pkgs.writeShellScriptBin \"motd\"\n    ''\n      #! /usr/bin/env bash\n      source /etc/os-release\n      service_status=$(systemctl list-units | grep podman-)\n\n      <- SNIP ->\n      printf \"$BOLDService status$ENDCOLOR\\n\"\n    '';\nin\n

This gets us a shells script we can then directly call into systemPackages - and after that its just a short hop to make this part of the shell init.

Note

Replace with your preferred shell!

environment.systemPackages = [\n    motd\n];\nprograms.fish.interactiveShellInit =  ''\n    motd\n'';\n

[CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

"},{"location":"tips/","title":"Tips","text":"
  • Dont make conditional imports (nix needs to resolve imports upfront)
  • can pass between nixos and home-manager with config.homemanager.users.. and osConfig.<x?
  • when adding home-manager to existing setup, the home-manager service may fail due to trying to over-write existing files in ~. Deleting these should allow the service to start
  • yaml = json, so using nix + builtins.toJSON a lot (and repl to vscode for testing)
  • checking values:

    "},{"location":"tips/#httpsgithubcomnixosnixpkgsblob90055d5e616bd943795d38808c94dbf0dd35abe8nixosmodulesconfigusers-groupsnixl116","title":"https://github.com/NixOS/nixpkgs/blob/90055d5e616bd943795d38808c94dbf0dd35abe8/nixos/modules/config/users-groups.nix#L116","text":"

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"includes/abbreviations/","title":"Abbreviations","text":"

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"maintenance/backups/","title":"Backups","text":"

    Nightly Backups are facilitated by NixOS's module for restic module and a helper module ive written.

    This does a nightly ZFS snapshot, in which apps and other mutable data is restic backed up to both a local folder on my NAS and also to Cloudflare R2 ). Backing up from a ZFS snapshot ensures that the restic backup is consistent, as backing up files in use (especially a sqlite database) will cause corruption. Here, all restic jobs are backing up as per the 2.05 snapshot, regardless of when they run that night.

    Another benefit of this approach is that it is service agnostic - containers, nixos services, qemu, whatever all have files in the same place on the filesystem (in the persistant folder) so they can all be backed up in the same fashion.

    The alternative is to shutdown services during backup (which could be facilitaed with the restic backup pre/post scripts) but ZFS snapshots are a godsend in this area, and im already running them for impermanence.

    Backing up without snapshots/shutdowns?

    This is a pattern I see a bit too - if you are backing up files raw without stopping your service beforehand you might want to check to ensure your backups aren't corrupted.

    The timeline then is:

    time activity 02.00 ZFS deletes prior snapshot and creates new one, to rpool/safe/persist@restic_nightly_snap 02.05 - 04.05 Restic backs up from new snapshot's hidden read-only mount .zfs with random delays per-service - to local and remote locations"},{"location":"maintenance/backups/#automatic-backups","title":"Automatic Backups","text":"

    I have added a sops secret for both my local and remote servers in my restic module /nixos/modules/nixos/services/restic/. This provides the restic password and 'AWS' credentials for the S3-compatible R2 bucket.

    Backups are created per-service in each services module. This is largely done with a lib helper ive written, which creates both the relevant restic backup local and remote entries in my nixosConfiguration. nixos/modules/nixos/lib.nix

    Why not backup the entire persist in one hit?

    Possibly a hold over from my k8s days, but its incredibly useful to be able to restore per-service, especially if you just want to move an app around or restore one app. You can always restore multiple repos with a script/taskfile.

    NixOS will create a service + timer for each job - below shows the output for a prowlarr local/remote backup.

    # Confirming snapshot taken overnight - we can see 2AM\ntruxnell@daedalus ~> systemctl status restic_nightly_snapshot.service\n\u25cb restic_nightly_snapshot.service - Nightly ZFS snapshot for Restic\n     Loaded: loaded (/etc/systemd/system/restic_nightly_snapshot.service; linked; preset: enabled)\n     Active: inactive (dead) since Wed 2024-04-17 02:00:02 AEST; 5h 34min ago\n   Duration: 61ms\nTriggeredBy: \u25cf restic_nightly_snapshot.timer\n    Process: 606080 ExecStart=/nix/store/vd0pr3la91pi0qhmcn7c80rwrn7jkpx9-unit-script-restic_nightly_snapshot-start/bin/restic_nightly_snapshot-start (code=exited, status=0/SUCCESS)\n   Main PID: 606080 (code=exited, status=0/SUCCESS)\n         IP: 0B in, 0B out\n        CPU: 21ms\n# confirming local snapshot occured - we can see 05:05AM\ntruxnell@daedalus ~ [1]> sudo restic-prowlarr-local snapshots\nrepository 9d9bf357 opened (version 2, compression level auto)\nID        Time                 Host        Tags        Paths\n---------------------------------------------------------------------------------------------------------------------\n293dad23  2024-04-15 19:24:37  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\n24938fe8  2024-04-16 12:42:50  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\n442d4de3  2024-04-17 05:05:04  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\n---------------------------------------------------------------------------------------------------------------------\n3 snapshots\n\n# confirming remote snapshot occured - we can see 4:52AM\ntruxnell@daedalus ~> sudo restic-prowlarr-remote snapshots\nrepository 30b7eef0 opened (version 2, compression level auto)\nID        Time                 Host        Tags        Paths\n---------------------------------------------------------------------------------------------------------------------\ne7d933c4  2024-04-15 22:07:09  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\naa605c6b  2024-04-16 02:39:47  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\n68f91a20  2024-04-17 04:52:59  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\n---------------------------------------------------------------------------------------------------------------------\n3 snapshots\n

    NixOS (as of 23.05 IIRC) now provides shims to enable easy access to the restic commands with the correct env vars mounted same as the service.

    truxnell@daedalus ~ [1]> sudo restic-prowlarr-local snapshots\nrepository 9d9bf357 opened (version 2, compression level auto)\nID        Time                 Host        Tags        Paths\n---------------------------------------------------------------------------------------------------------------------\n293dad23  2024-04-15 19:24:37  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\n24938fe8  2024-04-16 12:42:50  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\n---------------------------------------------------------------------------------------------------------------------\n2 snapshots\n
    "},{"location":"maintenance/backups/#manually-backing-up","title":"Manually backing up","text":"

    They are a systemd timer/service so you can query or trigger a manual run with systemctl start restic-backups-<service>-<destination> Local and remote work and function exactly the same, querying remote it just a fraction slower to return information.

    truxnell@daedalus ~ > sudo systemctl start restic-backups-prowlarr-local.service\n< no output >\ntruxnell@daedalus ~ [1]> sudo restic-prowlarr-local snapshots\nrepository 9d9bf357 opened (version 2, compression level auto)\nID        Time                 Host        Tags        Paths\n---------------------------------------------------------------------------------------------------------------------\n293dad23  2024-04-15 19:24:37  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\n24938fe8  2024-04-16 12:42:50  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\n---------------------------------------------------------------------------------------------------------------------\n2 snapshots\ntruxnell@daedalus ~> date\nTue Apr 16 12:43:20 AEST 2024\ntruxnell@daedalus ~>\n
    "},{"location":"maintenance/backups/#restoring-a-backup","title":"Restoring a backup","text":"

    Testing a restore (would do --target / for a real restore) Would just have to pause service, run restore, then re-start service.

    truxnell@daedalus ~ [1]> sudo restic-lidarr-local restore --target /tmp/lidarr/ latest\nrepository a2847581 opened (version 2, compression level auto)\n[0:00] 100.00%  2 / 2 index files loaded\nrestoring <Snapshot b96f4b94 of [/persist/nixos/lidarr] at 2024-04-14 04:19:41.533770692 +1000 AEST by root@daedalus> to /tmp/lidarr/\nSummary: Restored 52581 files/dirs (11.025 GiB) in 1:37\n
    "},{"location":"maintenance/backups/#failed-backup-notifications","title":"Failed backup notifications","text":"

    Failed backup notifications are baked-in due to the global Pushover notification on SystemD unit falure. No config nessecary

    Here I tested it by giving the systemd unit file a incorrect path.

    A deliberately failed backup to test notifications, hopefully I don't see a real one."},{"location":"maintenance/backups/#disabled-backup-warnings","title":"Disabled backup warnings","text":"

    Using module warnings, I have also put in warnings into my NixOS modules if I have disabled a warning on a host that isnt a development machine, just in case I do this or mixup flags on hosts. Roll your eyes, I will probably do it. This will pop up when I do a dry run/deployment - but not abort the build.

    It is eye catching thankfully

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"maintenance/software_updates/","title":"Software updates","text":"

    Its crucial to update software regularly - but a homelab isn't a google play store you forget about and let it do its thing. How do you update your software stack regular without breaking things?

    "},{"location":"maintenance/software_updates/#continuous-integration","title":"Continuous integration","text":"

    Continuous integration (CI) is running using Github Actions and Garnix. I have enabled branch protection rules to ensure all my devices successfully build before a PR is allowed to be pushed to main. This ensures I have a level of testing/confidence that an update of a device from the main branch will not break anything.

    Lovely sea of green passed checks"},{"location":"maintenance/software_updates/#binary-caching","title":"Binary Caching","text":"

    Binary caching is done for me by Garnix which is an amazing tool. I can then add them as substituter. These run each push to any branch and cache the build results for me. Even better, I can hook into them as above for CI purposes. No code to show here, you add it as an app to your github repo and it 'Just Works '

    # Substitutions\nsubstituters = [ \"https://cache.garnix.io\" ];\n\ntrusted-public-keys = [\n  \"nix-community.cachix.org-1:mB9FSh9qf2dCimDSUo8Zy7bkq5CX+/rkCWyvRCYg3Fs=\"\n];\n
    Lovely sea of green passed checks"},{"location":"maintenance/software_updates/#flake-updates","title":"Flake updates","text":"

    Github repo updates are provided by Renovate by Mend. These are auto-merged on a weekly schedule after passing CI. The settings can be found at /main/.github/renovate.json5

    The primary CI is a Garnix build, which Is already building and caching all my systems. Knowing all of the systems have built and cached goes a huge way toward ensuring main is a stable branch.

    "},{"location":"maintenance/software_updates/#docker-container-updates","title":"Docker container updates","text":"

    Container updates are provided by Renovate by Mend. These will either be manually merged after I have checked the upstream projects notes for breaking changes or auto-merged based on settings I have in /.github/renovate/autoMerge.json5.

    Semantic Versioning summary

    Semantic Versioning blurb is a format of MAJOR.MINOR.PATCH: MAJOR version when you make incompatible API changes (e.g. 1.7.8 -> 2.0.0) MINOR version when you add functionality in a backward compatible manner (e.g. 1.7.8 -> 1.8.0) PATCH version when you make backward compatible bug fixes (e.g. 1.7.8 -> 1.7.9)

    The auto-merge file allows me to define a pattern of which packages I want to auto-merge based on the upgrade type Renovate is suggesting. As many packages adhere to Semantic Versioning, I can determine how I 'feel' about the project, and decide to auto-merge specific tags. So for example, Sonarr has been reliable for me so I am ok merging all digest, patch and minor updates. I will always review a a major update, as it is likely to contain a breaking change.

    Respect pre-1.0.0 software!

    Semantic Versioning also specifies that all software before 1.0.0 may have a breaking change AT ANY TIME. Auto update pre 1.0 software at your own risk!

    The rational here is twofold. One is obvious - The entire point of doing Nix is reproducibility - what is the point of having flakes and SHA tags to provide the ability

    Also, I dont wan't a trillion PR's in my github repo waiting, but I also will not blindly update everything. There is a balance between updating for security/patching purposes and avoiding breaking changes. I know its popular to use :latest tag and a auto-update service like watchtower - trust me this is a bad idea.

    I only glanced away from my old homelab for a few months...

    Automatically updating all versions of containers will break things eventually!

    This is simply because projects from time to time will release breaking changes - totally different database schemas, overhaul config, replace entire parts of their software stack etc. If you let your service update totally automatically without checking for these you will wake up to a completely broken service like I did many, many years ago when Seafile did a major upgrade.

    Container updates are provided by a custom regex that matches my format for defining images in my nix modules.

        \"regexManagers\": [\n    {\n      fileMatch: [\"^.*\\\\.nix$\"],\n      matchStrings: [\n        'image *= *\"(?<depName>.*?):(?<currentValue>.*?)(@(?<currentDigest>sha256:[a-f0-9]+))?\";',\n      ],\n      datasourceTemplate: \"docker\",\n    }\n  ],\n

    And then I can pick and choose what level (if any) I want for container software. The below gives me brackets I can put containers in to enable auto-merge depending on how much I much i trust the container maintainer.

      \"packageRules\": [\n    {\n      // auto update up to major\n      \"matchDatasources\": ['docker'],\n      \"automerge\": \"true\",\n      \"automergeType\": \"branch\",\n      \"matchUpdateTypes\": [ 'minor', 'patch', 'digest'],\n      \"matchPackageNames\": [\n        'ghcr.io/onedr0p/sonarr',\n        'ghcr.io/onedr0p/readarr',\n        'ghcr.io/onedr0p/radarr',\n        'ghcr.io/onedr0p/lidarr',\n        'ghcr.io/onedr0p/prowlarr'\n        'ghcr.io/twin/gatus',\n      ]\n    },\n    // auto update up to minor\n    {\n      \"matchDatasources\": ['docker'],\n      \"automerge\": \"true\",\n      \"automergeType\": \"branch\",\n      \"matchUpdateTypes\": [ 'patch', 'digest'],\n      \"matchPackageNames\": [\n        \"ghcr.io/gethomepage/homepage\",\n      ]\n\n    }\n  ]\n

    Which results in automated PR's being raised - and possibly automatically merged into main if CI passes.

    Thankyou RenovateBot!

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"monitoring/systemd/","title":"SystemD pushover notifications","text":"

    Keeping with the goal of simple, I put together a curl script that can send me a pushover alert. I originally tied this to individual backups, until I realised how powerful it would be to just have it tied to every SystemD service globally.

    This way, I would never need to worry or consider what services are being created/destroyed and repeating myself ad nauseam.

    Why not Prometheus?

    I ran Prometheus/AlertManager for many years and well it can be easy to get TOO many notifications depending on your alerts, or to have issues with the big complex beast it is itself, or have alerts that trigger/reset/trigger (i.e. HDD temps). This gives me native, simple notifications I can rely on using basic tools - one of my design principles.

    Immediately I picked up with little effort:

    • Pod crashloop failed after too many quick restarts
    • Native service failure
    • Backup failures
    • AutoUpdate failure
    • etc
    NixOS SystemD built-in notifications for all occasions"},{"location":"monitoring/systemd/#adding-to-all-services","title":"Adding to all services","text":"

    This is accomplished in /nixos/modules/nixos/system/pushover, with a systemd service notify-pushover@.

    This can then be called by other services, which I setup with adding into my options:

      options.systemd.services = mkOption {\n    type = with types; attrsOf (\n      submodule {\n        config.onFailure = [ \"notify-pushover@%n.service\" ];\n      }\n    );\n

    This adds into every systemd NixOS generates the \"notify-pushover@%n.service\", where the systemd specifiers are injected with scriptArgs, and the simple bash script can refer to them as $1 etc.

    systemd.services.\"notify-pushover@\" = {\n      enable = true;\n      onFailure = lib.mkForce [ ]; # cant refer to itself on failure (1)\n      description = \"Notify on failed unit %i\";\n      serviceConfig = {\n        Type = \"oneshot\";\n        # User = config.users.users.truxnell.name;\n        EnvironmentFile = config.sops.secrets.\"services/pushover/env\".path; # (2)\n      };\n\n      # Script calls pushover with some deets.\n      # Here im using the systemd specifier %i passed into the script,\n      # which I can reference with bash $1.\n      scriptArgs = \"%i %H\"; # (3)\n      # (4)\n      script = ''\n        ${pkgs.curl}/bin/curl --fail -s -o /dev/null \\\n          --form-string \"token=$PUSHOVER_API_KEY\" \\\n          --form-string \"user=$PUSHOVER_USER_KEY\" \\\n          --form-string \"priority=1\" \\\n          --form-string \"html=1\" \\\n          --form-string \"timestamp=$(date +%s)\" \\\n          --form-string \"url=https://$2:9090/system/services#/$1\" \\\n          --form-string \"url_title=View in Cockpit\" \\\n          --form-string \"title=Unit failure: '$1' on $2\" \\\n          --form-string \"message=<b>$1</b> has failed on <b>$2</b><br><u>Journal tail:</u><br><br><i>$(journalctl -u $1 -n 10 -o cat)</i>\" \\\n          https://api.pushover.net/1/messages.json 2&>1\n      '';\n
    1. Force exclude this service from having the default 'onFailure' added
    2. Bring in pushover API/User ENV vars for script
    3. Pass SystemD specifiers into script
    4. Er.. script. Nix pops it into a shell script and refers to it in the unit.

    Bug

    I put in a nice link direct to Cockpit for the specific machine/service in question that doesnt quite work yet... ( #96)

    "},{"location":"monitoring/systemd/#excluding-from-a-services","title":"Excluding from a services","text":"

    Now we may not want this on ALL services. Especially the pushover-notify service itself. We can exclude this from a service using Nix nixpkgs.lib.mkForce

    # Over-write the default pushover\nsystemd.services.\"service\".onFailure = lib.mkForce [ ] option.\n

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"monitoring/warnings/","title":"Nix Warnings","text":"

    I've added warnings and assertations to code using nix to help me avoid misconfigurations. For example, if a module needs a database enabled, it can abort a deployment if it is not enabled. Similary, I have added warnings if I have disabled backups for production machines.

    But why, when its not being shared with others?

    Because I guarentee ill somehow stuff it up down the track and accidently disable things I didnt mean to. Roll your eyes, Ill thank myself later.

    Learnt from: Nix Manual

    "},{"location":"monitoring/warnings/#warnings","title":"Warnings","text":"

    Warnings will print a warning message duyring a nix build or deployment, but NOT stop the action. Great for things like reminders on disabled features

    To add a warning inside a module:

        # Warn if backups are disable and machine isn't a dev box\n    config.warnings = [\n      (mkIf (!cfg.local.enable && config.mySystem.purpose != \"Development\")\n        \"WARNING: Local backups are disabled!\")\n      (mkIf (!cfg.remote.enable && config.mySystem.purpose != \"Development\")\n        \"WARNING: Remote backups are disabled!\")\n    ];\n
    Oh THATS what I forgot to re-enable..."},{"location":"monitoring/warnings/#abortassert","title":"Abort/assert","text":"

    Warnings bigger and meaner brother. Stops a nix build/deploy dead in its tracks. Only useful for when deployment is incompatiable with running - i.e. a dependency not met in options.

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"monitoring/zed/","title":"Zed","text":"

    Zed monitoring can also send to pushover!

    Come on these drives are hardly 12months old

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"network/dns/","title":"Dns","text":"

    2 x adguard -> powerdns (authoritive) -> (quad9 || mullvad) note reverse dns (in.arpa) and split brain setup. dnssec

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"overview/design/","title":"Design principles","text":"

    Taking some lead from the Zen of Python:

    • Minimise dependencies, where required, explicitly define dependencies
    • Use plain Nix & bash to solve problems over additional tooling
    • Stable channel for stable machines. Unstable only where features are important.
    • Modules for a specific service - Profiles for broad configuration of state.
    • Write readable code - descriptive variable names and modules
    • Keep functions/dependencies within the relevant module where possible
    • Errors should never pass silently - use assert etc for misconfigurations
    • Flat is better than nested - use built-in functions like map, filter, and fold to operate on lists or sets

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"overview/features/","title":"Features","text":"

    Some things I'm proud of. Or just happy they exist so I can forget about something until I need to worry.

    • Nightly BackupsA ZFS snapshot is done at night, with restic then backing up to both locally and cloud. NixOS wrappers make restoring a single command line entry.ZFS snapshot before backup is important to ensure restic isnt backing up files that are in use, which would cause corruption.
    • Software UpdatesRenovate Bot regulary runs on this Github repo, updating the flake lockfile, containers and other dependencies automatically. Automerge is enabled for updates I expect will be routine, but waits for manual PR approval for updates I suspect may require reading changelog for breaking changes
    • Impermance:Inspried by the Erase your Darlings post, Servers run zfs and rollback to a blank snapshot at night. This ensures repeatable NixOS deployments and no cruft, and also hardens servers a little.
    • SystemD Notifications:Systemd hook that adds a pushover notification to any systemd unit failure for any unit NixOS is aware of. No worrying about forgetting to add a notification to every new service or worrying about missing one.

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"overview/goals/","title":"Goals","text":"

    When I set about making this lab I had a number of goals - I wonder how well I will do ?

    A master list of ideas/goals/etc can be found at Issue #1

    • Stability NixOS stable channel for core services unstable for desktop apps/non-mission critical where desired. Containers with SHA256 pinning for server apps
    • KISSKeep it Simple, use boring, reliable, trusted tools - not todays flashy new software repo
    • Easy UpdatesWeekly update schedule, utilizing Renovate for updating lockfile and container images. Autoupdates enabled off main branch for mission critical. Aim for 'magic rollback' on upgrade failure
    • BackupsNightly restic backups to both cloud and NAS. All databases to have nightly backups. Test backups regulary
    • ReproducabilityFlakes & Git for version pinning, SHA256 tags for containers.
    • MonitoringAutomated monitoring on failure & critical summaries, using basic tools. Use Gatus for both internal and external monitoring
    • Continuous IntegrationCI against main branch to ensure all code compiles OK. Use PR's to add to main and dont skip CI due to impatience
    • SecurityDont use containers with S6 overlay/root (i.e. LSIO ). Expose minimal ports at router, Reduce attack surface by keeping it simple, review hardening containers/podman/NixOS
    • Ease of administrationLean into the devil that is SystemD - and have one standard interface to see logs, manipulate services, etc. Run containers as podman services, and webui's for watching/debugging
    • Secrets ssshh..Sops-nix for secrets, living in my gitrepo. Avoid cloud services like I used in k8s (i.e. Doppler.io)

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"overview/k8s/","title":"K8s","text":"

    Removed complexity

    • external secrets -> bog standard sops
    • HA file storage -> standard file system
    • HA database cluster -> nixos standard cluster
    • Database user operator -> nixos standard ensure_users
    • Database permissions operator -> why even??
    • secrets reloader -> sops restart_unit
    • easier managment, all services run through systemd for consistency, cockpit makes viewing logs/pod console etc easy.

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"overview/options/","title":"Options","text":"

    Explain mySystem and myHome

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"overview/structure/","title":"Repository Structure","text":"

    Note

    Oh god writing this now is a horrid idea, I always refactor like 50 times...

    Here is a bit of a walkthrough of the repository structure so you I can have a vague idea on what is going on. Organizing a monorepo is hard at the best of times.

    \u251c\u2500\u2500 .github\n\u2502   \u251c\u2500\u2500 renovate            Renovate modules\n\u2502   \u251c\u2500\u2500 workflows             Github Action workflows (i.e. CI/Site building)\n\u2502   \u2514\u2500\u2500 renovate.json5        Renovate core settings\n\u251c\u2500\u2500 .taskfiles              go-task file modules\n\u251c\u2500\u2500 docs                    This mkdocs-material site\n\u2502   nixos                   Nixos Modules\n\u2502   \u2514\u2500\u2500 home                  home-manager nix files\n\u2502       \u251c\u2500\u2500 modules             home-manager modules\n\u2502       \u2514\u2500\u2500 truxnell            home-manager user\n\u2502   \u251c\u2500\u2500 hosts                 hosts for nix - starting point of configs.\n\u2502   \u251c\u2500\u2500 modules               nix modules\n\u2502   \u251c\u2500\u2500 overlays              nixpkgs overlays\n\u2502   \u251c\u2500\u2500 pkgs                  custom nix packages\n\u2502   \u2514\u2500\u2500 profiles              host profiles\n\u251c\u2500\u2500 README.md               Github Repo landing page\n\u251c\u2500\u2500 flake.nix               Core flake\n\u251c\u2500\u2500 flake.lock              Lockfile\n\u251c\u2500\u2500 LICENSE                 Project License\n\u251c\u2500\u2500 mkdocs.yml              mkdocs settings\n\u2514\u2500\u2500 Taskfile.yaml           go-task core file\n

    Whew that wasnt so hard right... right?

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"security/containers/","title":"Containers","text":""},{"location":"security/containers/#container-images","title":"Container images","text":"

    Dont use LSIO!

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"vm/faq/","title":"Faq","text":""},{"location":"vm/faq/#why-not-recurse-the-module-folder","title":"Why not recurse the module folder","text":"

    Imports are special in NIX and its important that they are defined at runtime for lazy evaluation - if you do optional/coded imports not everything is available for evaluating.

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"vm/impermance/","title":"Impermance","text":"
    • need to save ssh keys on reboot
    • else you end up with sops issues & ssh known_key changes every reboot
    • need to sort out password

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"vm/installing-x86_64/","title":"Installing x86 64","text":""},{"location":"vm/installing-x86_64/#installing-a-playground-vm","title":"Installing a playground VM","text":"

    I've used gnome-boxes from my current Fedora laptop for running playground vm's.

    Settings: ISO: nixos-minimal Hard drive: 32GB RAM: 2GB EFI: Enable

    Expose port 22 to allow ssh into vm (host port 3022, guest 22)

    # set temp root passwd\nsudo su\npasswd\n

    sshd is already running, so you can now ssh into the vm remotely for the rest of the setup. ssh root@127.0.0.1 -p 3022

    # Partitioning\nparted /dev/sda -- mklabel gpt\nparted /dev/sda -- mkpart root ext4 512MB -8GB\nparted /dev/sda -- mkpart swap linux-swap -8GB 100%\nparted /dev/sda -- mkpart ESP fat32 1MB 512MB\nparted /dev/sda -- set 3 esp on\n\n# Formatting\nmkfs.ext4 -L nixos /dev/sda1\nmkswap -L swap /dev/sda2\nmkfs.fat -F 32 -n boot /dev/sda3\n\n# Mounting disks for installation\nmount /dev/disk/by-label/nixos /mnt\nmkdir -p /mnt/boot\nmount /dev/disk/by-label/boot /mnt/boot\nswapon /dev/sda2\n\n# Generating default configuration\nnixos-generate-config --root /mnt\n

    From this config copy the bootstrap configuration and fetch the hardware configuration.

    scp -P 3022 nixos/hosts/bootstrap/configuration.nix root@127.0.0.1:/mnt/etc/nixos/configuration.nix\nscp -P 3022 root@127.0.0.1:/mnt/etc/nixos/hardware-configuration.nix nixos/hosts/nixosvm/hardware-configuration.nix\n

    Then back to the VM

    nixos-install\nreboot\nnixos-rebuild switch\n

    Set the password for the user that was created. Might need to use su?

    passwd truxnell\n

    Also grab the ssh keys and re-encrypt sops

    cat /etc/ssh/ssh_host_ed25519_key.pub | ssh-to-age\n

    then run task

    Login as user, copy nix git OR for remote machines/servers just nixos-install --impure --flake github:truxnell/nix-config#<MACHINE_ID>

    mkdir .local\ncd .local\ngit clone https://github.com/truxnell/nix-config.git\ncd nix-config\n

    Apply config to bootstrapped device First time around, MUST APPLY with name of host in ./hosts/ This is because .. --flake . looks for a nixosConfigurations key with the machines hostname The bootstrap machine will be called 'nixos-bootstrap' so the flake by default would resolve nixosConfigurations.nixos-bootstrap Subsequent rebuilds can be called with the default command as after first build the machines hostname will be changed to the desired machine

    nixos-rebuild switch --flake .#<machinename>\n

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"vm/installing-zfs-impermance/","title":"Installing zfs impermance","text":"

    https://grahamc.com/blog/erase-your-darlings/

    "},{"location":"vm/installing-zfs-impermance/#get-hostid","title":"Get hostid","text":"

    run head -c 8 /etc/machine-id and copy into networking.hostId to ensure ZFS doesnt get borked on reboot

    "},{"location":"vm/installing-zfs-impermance/#partitioning","title":"Partitioning","text":"

    parted /dev/nvme0n1 -- mklabel gpt parted /dev/nvme0n1 -- mkpart root ext4 512MB -8GB parted /dev/nvme0n1 -- mkpart swap linux-swap -8GB 100% parted /dev/nvme0n1 -- mkpart ESP fat32 1MB 512MB parted /dev/nvme0n1 -- set 3 esp on

    "},{"location":"vm/installing-zfs-impermance/#formatting","title":"Formatting","text":"

    mkswap -L swap /dev/nvme0n1p2 swapon /dev/nvme0n1p2 mkfs.fat -F 32 -n boot /dev/nvme0n1p3

    "},{"location":"vm/installing-zfs-impermance/#zfs-on-root-partition","title":"ZFS on root partition","text":"

    zpool create -O mountpoint=none rpool /dev/nvme0n1p1

    zfs create -p -o mountpoint=legacy rpool/local/root

    "},{"location":"vm/installing-zfs-impermance/#immediate-blank-snapshot","title":"immediate blank snapshot","text":"

    zfs snapshot rpool/local/root@blank mount -t zfs rpool/local/root /mnt

    "},{"location":"vm/installing-zfs-impermance/#boot-partition","title":"Boot partition","text":"

    mkdir /mnt/boot mount /dev/nvme0n1p3 /mnt/boot

    "},{"location":"vm/installing-zfs-impermance/#mk-nix","title":"mk nix","text":"

    zfs create -p -o mountpoint=legacy rpool/local/nix mkdir /mnt/nix mount -t zfs rpool/local/nix /mnt/nix

    "},{"location":"vm/installing-zfs-impermance/#and-a-dataset-for-home-if-needed","title":"And a dataset for /home: if needed","text":"

    zfs create -p -o mountpoint=legacy rpool/safe/home mkdir /mnt/home mount -t zfs rpool/safe/home /mnt/home

    zfs create -p -o mountpoint=legacy rpool/safe/persist mkdir /mnt/persist mount -t zfs rpool/safe/persist /mnt/persist

    Set networking.hostid`` in the nixos config tohead -c 8 /etc/machine-id`

    nixos-install --impure --flake github:truxnell/nix-config#<MACHINE_ID>\n

    consider a nixos-enter to import a zpool if required (for NAS) instead of rebooting post-install

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"vm/secrets/","title":"Generate age key per machine","text":"

    On new machine, run below to transfer its shiny new ed25519 to age

    nix-shell -p ssh-to-age --run 'cat /etc/ssh/ssh_host_ed25519_key.pub | ssh-to-age'\n

    Copy this into ./.sops.yaml in base repo, then re-run taskfile task sops:re-encrypt to loop through all sops keys, decrypt then re-encrypt

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\u200b\\-_,:!=\\[\\]()\"`/]+|\\.(?!\\d)|&[lg]t;|(?!\\b)(?=[A-Z][a-z])","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"readme.md","text":"

    \ud83d\udc4b Welcome to my NixoOS home and homelab configuration. This monorepo is my personal nix/nixos setup for all my devices, specifically my homelab.

    This is the end result of a recovering k8s addict - who no longer enjoyed the time and effort I personally found it took to run k8s at home.

    "},{"location":"#why","title":"Why?","text":"

    Having needed a break from hobby's for some health related reasons, I found coming back to a unpatched cluster a chore, which was left unattented. Then a cheap SSD in my custom VyOS router blew, leading me to just put back in my Unifi Dreammachine router, which broke the custom DNS I was running for my cluster, which caused it issues.

    During fixing the DNS issue, a basic software upgrade for the custom k8s OS I was running k8s on broke my cluster for the 6th time running, coupled with using a older version of the script tool I used to manage its machine config yaml, which ended up leading to my 6th k8s disaster recovery ).

    Looking at my boring Ubuntu ZFS nas which just ran and ran and ran without needing TLC, and remembering the old days with Ubuntu + Docker Compose being hands-off ), I dove into nix, with the idea of getting back to basics of boring proven tools, with the power of nix's declarative system.

    "},{"location":"#goals","title":"Goals","text":"

    One of my goals is to bring what I learnt running k8s at home with some of the best homelabbers, into the nix world and see just how much of the practices I learnt I can apply to a nix setup, while focussing on having a solid, reliable, setup that I can leave largely unattended for months without issues cropping up.

    The goal of this doc is for me to slow down a bit and jot down how and why I am doing what im doing in a module, and cover how I have approached the faucets of homelabbing, so YOU can understand, steal with pride from my code, and hopefully(?) learn a thing or two.

    To teach me a thing or two, contact me or raise a Issue. PR's may or may not be taken as a personal attack - this is my home setup after all.

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"motd/","title":"Message of the day","text":"

    Why not include a nice message of the day for each server I log into?

    The below gives some insight into what the servers running, status of zpools, usage, etc. While not show below - thankfully - If a zpool error is found the status gives a full zpool status -x debrief which is particulary eye-catching upon login.

    I've also squeezed in a 'reboot required' flag for when the server had detected its running kernel/init/systemd is a different version to what it booted with - useful to know when long running servers require a reboot to pick up new kernel/etc versions.

    Message of the day

    Code TLDR

    /nixos/modules/nixos/system/motd

    Write a shell script using nix with a bash motd of your choosing.

    let\n  motd = pkgs.writeShellScriptBin \"motd\"\n    ''\n      #! /usr/bin/env bash\n      source /etc/os-release\n      service_status=$(systemctl list-units | grep podman-)\n\n      <- SNIP ->\n      printf \"$BOLDService status$ENDCOLOR\\n\"\n    '';\nin\n

    This gets us a shells script we can then directly call into systemPackages - and after that its just a short hop to make this part of the shell init.

    Note

    Replace with your preferred shell!

    environment.systemPackages = [\n    motd\n];\nprograms.fish.interactiveShellInit =  ''\n    motd\n'';\n

    [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

    "},{"location":"tips/","title":"Tips","text":"
    • Dont make conditional imports (nix needs to resolve imports upfront)
    • can pass between nixos and home-manager with config.homemanager.users.. and osConfig.<x?
    • when adding home-manager to existing setup, the home-manager service may fail due to trying to over-write existing files in ~. Deleting these should allow the service to start
    • yaml = json, so using nix + builtins.toJSON a lot (and repl to vscode for testing)
    • checking values:

      "},{"location":"tips/#httpsgithubcomnixosnixpkgsblob90055d5e616bd943795d38808c94dbf0dd35abe8nixosmodulesconfigusers-groupsnixl116","title":"https://github.com/NixOS/nixpkgs/blob/90055d5e616bd943795d38808c94dbf0dd35abe8/nixos/modules/config/users-groups.nix#L116","text":"

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"includes/abbreviations/","title":"Abbreviations","text":"

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"maintenance/backups/","title":"Backups","text":"

      Nightly Backups are facilitated by NixOS's module for restic module and a helper module ive written.

      This does a nightly ZFS snapshot, in which apps and other mutable data is restic backed up to both a local folder on my NAS and also to Cloudflare R2 . Backing up from a ZFS snapshot ensures that the restic backup is consistent, as backing up files in use (especially a sqlite database) will cause corruption. Here, all restic jobs are backing up as per the 2.05 snapshot, regardless of when they run that night.

      Another benefit of this approach is that it is service agnostic - containers, nixos services, qemu, whatever all have files in the same place on the filesystem (in the persistant folder) so they can all be backed up in the same fashion.

      The alternative is to shutdown services during backup (which could be facilitaed with the restic backup pre/post scripts) but ZFS snapshots are a godsend in this area, and im already running them for impermanence.

      Backing up without snapshots/shutdowns?

      This is a pattern I see a bit too - if you are backing up files raw without stopping your service beforehand you might want to check to ensure your backups aren't corrupted.

      The timeline then is:

      time activity 02.00 ZFS deletes prior snapshot and creates new one, to rpool/safe/persist@restic_nightly_snap 02.05 - 04.05 Restic backs up from new snapshot's hidden read-only mount .zfs with random delays per-service - to local and remote locations"},{"location":"maintenance/backups/#automatic-backups","title":"Automatic Backups","text":"

      I have added a sops secret for both my local and remote servers in my restic module /nixos/modules/nixos/services/restic/. This provides the restic password and 'AWS' credentials for the S3-compatible R2 bucket.

      Backups are created per-service in each services module. This is largely done with a lib helper ive written, which creates both the relevant restic backup local and remote entries in my nixosConfiguration. nixos/modules/nixos/lib.nix

      Why not backup the entire persist in one hit?

      Possibly a hold over from my k8s days, but its incredibly useful to be able to restore per-service, especially if you just want to move an app around or restore one app. You can always restore multiple repos with a script/taskfile.

      NixOS will create a service + timer for each job - below shows the output for a prowlarr local/remote backup.

      # Confirming snapshot taken overnight - we can see 2AM\ntruxnell@daedalus ~> systemctl status restic_nightly_snapshot.service\n\u25cb restic_nightly_snapshot.service - Nightly ZFS snapshot for Restic\n     Loaded: loaded (/etc/systemd/system/restic_nightly_snapshot.service; linked; preset: enabled)\n     Active: inactive (dead) since Wed 2024-04-17 02:00:02 AEST; 5h 34min ago\n   Duration: 61ms\nTriggeredBy: \u25cf restic_nightly_snapshot.timer\n    Process: 606080 ExecStart=/nix/store/vd0pr3la91pi0qhmcn7c80rwrn7jkpx9-unit-script-restic_nightly_snapshot-start/bin/restic_nightly_snapshot-start (code=exited, status=0/SUCCESS)\n   Main PID: 606080 (code=exited, status=0/SUCCESS)\n         IP: 0B in, 0B out\n        CPU: 21ms\n# confirming local snapshot occured - we can see 05:05AM\ntruxnell@daedalus ~ [1]> sudo restic-prowlarr-local snapshots\nrepository 9d9bf357 opened (version 2, compression level auto)\nID        Time                 Host        Tags        Paths\n---------------------------------------------------------------------------------------------------------------------\n293dad23  2024-04-15 19:24:37  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\n24938fe8  2024-04-16 12:42:50  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\n442d4de3  2024-04-17 05:05:04  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\n---------------------------------------------------------------------------------------------------------------------\n3 snapshots\n\n# confirming remote snapshot occured - we can see 4:52AM\ntruxnell@daedalus ~> sudo restic-prowlarr-remote snapshots\nrepository 30b7eef0 opened (version 2, compression level auto)\nID        Time                 Host        Tags        Paths\n---------------------------------------------------------------------------------------------------------------------\ne7d933c4  2024-04-15 22:07:09  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\naa605c6b  2024-04-16 02:39:47  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\n68f91a20  2024-04-17 04:52:59  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\n---------------------------------------------------------------------------------------------------------------------\n3 snapshots\n

      NixOS (as of 23.05 IIRC) now provides shims to enable easy access to the restic commands with the correct env vars mounted same as the service.

      truxnell@daedalus ~ [1]> sudo restic-prowlarr-local snapshots\nrepository 9d9bf357 opened (version 2, compression level auto)\nID        Time                 Host        Tags        Paths\n---------------------------------------------------------------------------------------------------------------------\n293dad23  2024-04-15 19:24:37  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\n24938fe8  2024-04-16 12:42:50  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\n---------------------------------------------------------------------------------------------------------------------\n2 snapshots\n
      "},{"location":"maintenance/backups/#manually-backing-up","title":"Manually backing up","text":"

      They are a systemd timer/service so you can query or trigger a manual run with systemctl start restic-backups-<service>-<destination> Local and remote work and function exactly the same, querying remote it just a fraction slower to return information.

      truxnell@daedalus ~ > sudo systemctl start restic-backups-prowlarr-local.service\n< no output >\ntruxnell@daedalus ~ [1]> sudo restic-prowlarr-local snapshots\nrepository 9d9bf357 opened (version 2, compression level auto)\nID        Time                 Host        Tags        Paths\n---------------------------------------------------------------------------------------------------------------------\n293dad23  2024-04-15 19:24:37  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\n24938fe8  2024-04-16 12:42:50  daedalus                /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr\n---------------------------------------------------------------------------------------------------------------------\n2 snapshots\ntruxnell@daedalus ~> date\nTue Apr 16 12:43:20 AEST 2024\ntruxnell@daedalus ~>\n
      "},{"location":"maintenance/backups/#restoring-a-backup","title":"Restoring a backup","text":"

      Testing a restore (would do --target / for a real restore) Would just have to pause service, run restore, then re-start service.

      truxnell@daedalus ~ [1]> sudo restic-lidarr-local restore --target /tmp/lidarr/ latest\nrepository a2847581 opened (version 2, compression level auto)\n[0:00] 100.00%  2 / 2 index files loaded\nrestoring <Snapshot b96f4b94 of [/persist/nixos/lidarr] at 2024-04-14 04:19:41.533770692 +1000 AEST by root@daedalus> to /tmp/lidarr/\nSummary: Restored 52581 files/dirs (11.025 GiB) in 1:37\n
      "},{"location":"maintenance/backups/#failed-backup-notifications","title":"Failed backup notifications","text":"

      Failed backup notifications are baked-in due to the global Pushover notification on SystemD unit falure. No config nessecary

      Here I tested it by giving the systemd unit file a incorrect path.

      A deliberately failed backup to test notifications, hopefully I don't see a real one."},{"location":"maintenance/backups/#disabled-backup-warnings","title":"Disabled backup warnings","text":"

      Using module warnings, I have also put in warnings into my NixOS modules if I have disabled a warning on a host that isnt a development machine, just in case I do this or mixup flags on hosts. Roll your eyes, I will probably do it. This will pop up when I do a dry run/deployment - but not abort the build.

      It is eye catching thankfully

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"maintenance/software_updates/","title":"Software updates","text":"

      Its crucial to update software regularly - but a homelab isn't a google play store you forget about and let it do its thing. How do you update your software stack regular without breaking things?

      "},{"location":"maintenance/software_updates/#continuous-integration","title":"Continuous integration","text":"

      Continuous integration (CI) is running using Github Actions and Garnix. I have enabled branch protection rules to ensure all my devices successfully build before a PR is allowed to be pushed to main. This ensures I have a level of testing/confidence that an update of a device from the main branch will not break anything.

      Lovely sea of green passed checks"},{"location":"maintenance/software_updates/#binary-caching","title":"Binary Caching","text":"

      Binary caching is done for me by Garnix which is an amazing tool. I can then add them as substituter. These run each push to any branch and cache the build results for me. Even better, I can hook into them as above for CI purposes. No code to show here, you add it as an app to your github repo and it 'Just Works '

      # Substitutions\nsubstituters = [ \"https://cache.garnix.io\" ];\n\ntrusted-public-keys = [\n  \"nix-community.cachix.org-1:mB9FSh9qf2dCimDSUo8Zy7bkq5CX+/rkCWyvRCYg3Fs=\"\n];\n
      Lovely sea of green passed checks"},{"location":"maintenance/software_updates/#flake-updates","title":"Flake updates","text":"

      Github repo updates are provided by Renovate by Mend. These are auto-merged on a weekly schedule after passing CI. The settings can be found at /main/.github/renovate.json5

      The primary CI is a Garnix build, which Is already building and caching all my systems. Knowing all of the systems have built and cached goes a huge way toward ensuring main is a stable branch.

      "},{"location":"maintenance/software_updates/#docker-container-updates","title":"Docker container updates","text":"

      Container updates are provided by Renovate by Mend. These will either be manually merged after I have checked the upstream projects notes for breaking changes or auto-merged based on settings I have in /.github/renovate/autoMerge.json5.

      Semantic Versioning summary

      Semantic Versioning blurb is a format of MAJOR.MINOR.PATCH: MAJOR version when you make incompatible API changes (e.g. 1.7.8 -> 2.0.0) MINOR version when you add functionality in a backward compatible manner (e.g. 1.7.8 -> 1.8.0) PATCH version when you make backward compatible bug fixes (e.g. 1.7.8 -> 1.7.9)

      The auto-merge file allows me to define a pattern of which packages I want to auto-merge based on the upgrade type Renovate is suggesting. As many packages adhere to Semantic Versioning, I can determine how I 'feel' about the project, and decide to auto-merge specific tags. So for example, Sonarr has been reliable for me so I am ok merging all digest, patch and minor updates. I will always review a a major update, as it is likely to contain a breaking change.

      Respect pre-1.0.0 software!

      Semantic Versioning also specifies that all software before 1.0.0 may have a breaking change AT ANY TIME. Auto update pre 1.0 software at your own risk!

      The rational here is twofold. One is obvious - The entire point of doing Nix is reproducibility - what is the point of having flakes and SHA tags to provide the ability

      Also, I dont wan't a trillion PR's in my github repo waiting, but I also will not blindly update everything. There is a balance between updating for security/patching purposes and avoiding breaking changes. I know its popular to use :latest tag and a auto-update service like watchtower - trust me this is a bad idea.

      I only glanced away from my old homelab for a few months...

      Automatically updating all versions of containers will break things eventually!

      This is simply because projects from time to time will release breaking changes - totally different database schemas, overhaul config, replace entire parts of their software stack etc. If you let your service update totally automatically without checking for these you will wake up to a completely broken service like I did many, many years ago when Seafile did a major upgrade.

      Container updates are provided by a custom regex that matches my format for defining images in my nix modules.

          \"regexManagers\": [\n    {\n      fileMatch: [\"^.*\\\\.nix$\"],\n      matchStrings: [\n        'image *= *\"(?<depName>.*?):(?<currentValue>.*?)(@(?<currentDigest>sha256:[a-f0-9]+))?\";',\n      ],\n      datasourceTemplate: \"docker\",\n    }\n  ],\n

      And then I can pick and choose what level (if any) I want for container software. The below gives me brackets I can put containers in to enable auto-merge depending on how much I much i trust the container maintainer.

        \"packageRules\": [\n    {\n      // auto update up to major\n      \"matchDatasources\": ['docker'],\n      \"automerge\": \"true\",\n      \"automergeType\": \"branch\",\n      \"matchUpdateTypes\": [ 'minor', 'patch', 'digest'],\n      \"matchPackageNames\": [\n        'ghcr.io/onedr0p/sonarr',\n        'ghcr.io/onedr0p/readarr',\n        'ghcr.io/onedr0p/radarr',\n        'ghcr.io/onedr0p/lidarr',\n        'ghcr.io/onedr0p/prowlarr'\n        'ghcr.io/twin/gatus',\n      ]\n    },\n    // auto update up to minor\n    {\n      \"matchDatasources\": ['docker'],\n      \"automerge\": \"true\",\n      \"automergeType\": \"branch\",\n      \"matchUpdateTypes\": [ 'patch', 'digest'],\n      \"matchPackageNames\": [\n        \"ghcr.io/gethomepage/homepage\",\n      ]\n\n    }\n  ]\n

      Which results in automated PR's being raised - and possibly automatically merged into main if CI passes.

      Thankyou RenovateBot!

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"monitoring/systemd/","title":"SystemD pushover notifications","text":"

      Keeping with the goal of simple, I put together a curl script that can send me a pushover alert. I originally tied this to individual backups, until I realised how powerful it would be to just have it tied to every SystemD service globally.

      This way, I would never need to worry or consider what services are being created/destroyed and repeating myself ad nauseam.

      Why not Prometheus?

      I ran Prometheus/AlertManager for many years and well it can be easy to get TOO many notifications depending on your alerts, or to have issues with the big complex beast it is itself, or have alerts that trigger/reset/trigger (i.e. HDD temps). This gives me native, simple notifications I can rely on using basic tools - one of my design principles.

      Immediately I picked up with little effort:

      • Pod crashloop failed after too many quick restarts
      • Native service failure
      • Backup failures
      • AutoUpdate failure
      • etc
      NixOS SystemD built-in notifications for all occasions"},{"location":"monitoring/systemd/#adding-to-all-services","title":"Adding to all services","text":"

      This is accomplished in /nixos/modules/nixos/system/pushover, with a systemd service notify-pushover@.

      This can then be called by other services, which I setup with adding into my options:

        options.systemd.services = mkOption {\n    type = with types; attrsOf (\n      submodule {\n        config.onFailure = [ \"notify-pushover@%n.service\" ];\n      }\n    );\n

      This adds into every systemd NixOS generates the \"notify-pushover@%n.service\", where the systemd specifiers are injected with scriptArgs, and the simple bash script can refer to them as $1 etc.

      systemd.services.\"notify-pushover@\" = {\n      enable = true;\n      onFailure = lib.mkForce [ ]; # cant refer to itself on failure (1)\n      description = \"Notify on failed unit %i\";\n      serviceConfig = {\n        Type = \"oneshot\";\n        # User = config.users.users.truxnell.name;\n        EnvironmentFile = config.sops.secrets.\"services/pushover/env\".path; # (2)\n      };\n\n      # Script calls pushover with some deets.\n      # Here im using the systemd specifier %i passed into the script,\n      # which I can reference with bash $1.\n      scriptArgs = \"%i %H\"; # (3)\n      # (4)\n      script = ''\n        ${pkgs.curl}/bin/curl --fail -s -o /dev/null \\\n          --form-string \"token=$PUSHOVER_API_KEY\" \\\n          --form-string \"user=$PUSHOVER_USER_KEY\" \\\n          --form-string \"priority=1\" \\\n          --form-string \"html=1\" \\\n          --form-string \"timestamp=$(date +%s)\" \\\n          --form-string \"url=https://$2:9090/system/services#/$1\" \\\n          --form-string \"url_title=View in Cockpit\" \\\n          --form-string \"title=Unit failure: '$1' on $2\" \\\n          --form-string \"message=<b>$1</b> has failed on <b>$2</b><br><u>Journal tail:</u><br><br><i>$(journalctl -u $1 -n 10 -o cat)</i>\" \\\n          https://api.pushover.net/1/messages.json 2&>1\n      '';\n
      1. Force exclude this service from having the default 'onFailure' added
      2. Bring in pushover API/User ENV vars for script
      3. Pass SystemD specifiers into script
      4. Er.. script. Nix pops it into a shell script and refers to it in the unit.

      Bug

      I put in a nice link direct to Cockpit for the specific machine/service in question that doesnt quite work yet... ( #96)

      "},{"location":"monitoring/systemd/#excluding-from-a-services","title":"Excluding from a services","text":"

      Now we may not want this on ALL services. Especially the pushover-notify service itself. We can exclude this from a service using Nix nixpkgs.lib.mkForce

      # Over-write the default pushover\nsystemd.services.\"service\".onFailure = lib.mkForce [ ] option.\n

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"monitoring/warnings/","title":"Nix Warnings","text":"

      I've added warnings and assertations to code using nix to help me avoid misconfigurations. For example, if a module needs a database enabled, it can abort a deployment if it is not enabled. Similary, I have added warnings if I have disabled backups for production machines.

      But why, when its not being shared with others?

      Because I guarentee ill somehow stuff it up down the track and accidently disable things I didnt mean to. Roll your eyes, Ill thank myself later.

      Learnt from: Nix Manual

      "},{"location":"monitoring/warnings/#warnings","title":"Warnings","text":"

      Warnings will print a warning message duyring a nix build or deployment, but NOT stop the action. Great for things like reminders on disabled features

      To add a warning inside a module:

          # Warn if backups are disable and machine isn't a dev box\n    config.warnings = [\n      (mkIf (!cfg.local.enable && config.mySystem.purpose != \"Development\")\n        \"WARNING: Local backups are disabled!\")\n      (mkIf (!cfg.remote.enable && config.mySystem.purpose != \"Development\")\n        \"WARNING: Remote backups are disabled!\")\n    ];\n
      Oh THATS what I forgot to re-enable..."},{"location":"monitoring/warnings/#abortassert","title":"Abort/assert","text":"

      Warnings bigger and meaner brother. Stops a nix build/deploy dead in its tracks. Only useful for when deployment is incompatiable with running - i.e. a dependency not met in options.

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"monitoring/zed/","title":"Zed","text":"

      Zed monitoring can also send to pushover!

      Come on these drives are hardly 12months old

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"network/dns/","title":"Dns","text":"

      2 x adguard -> powerdns (authoritive) -> (quad9 || mullvad) note reverse dns (in.arpa) and split brain setup. dnssec

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"network/dns_dhcp/","title":"DNS & DHCP","text":"

      TLDR

      External DNS: Client -> Adguard Home (r->

      My DNS has evolved and changed over time, especially with a personal desire to keep my entire internet backbone boring and standard off a trusted vendor. 'Why cant I connect to my Minecraft server' and 'Are you playing with the internet again' are questions I don't want to have to answer in this house.

      Sadly, while I do love my Unifi Dream Machine Pro, its DNS opportunity is lackluster and I really prefer split-dns so I don't have to access everything with ip:port.

      "},{"location":"network/dns_dhcp/#general","title":"General","text":"

      My devices all use the Unifi DHCP server to get addresses, which I much prefer so I maintain all my clients in the single-pane-of-glass the UDMP provides. In the DHCP options, I add the

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"overview/design/","title":"Design principles","text":"

      Taking some lead from the Zen of Python:

      • Minimise dependencies, where required, explicitly define dependencies
      • Use plain Nix & bash to solve problems over additional tooling
      • Stable channel for stable machines. Unstable only where features are important.
      • Modules for a specific service - Profiles for broad configuration of state.
      • Write readable code - descriptive variable names and modules
      • Keep functions/dependencies within the relevant module where possible
      • Errors should never pass silently - use assert etc for misconfigurations
      • Flat is better than nested - use built-in functions like map, filter, and fold to operate on lists or sets

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"overview/features/","title":"Features","text":"

      Some things I'm proud of. Or just happy they exist so I can forget about something until I need to worry.

      • Nightly BackupsA ZFS snapshot is done at night, with restic then backing up to both locally and cloud. NixOS wrappers make restoring a single command line entry.ZFS snapshot before backup is important to ensure restic isnt backing up files that are in use, which would cause corruption.
      • Software UpdatesRenovate Bot regulary runs on this Github repo, updating the flake lockfile, containers and other dependencies automatically. Automerge is enabled for updates I expect will be routine, but waits for manual PR approval for updates I suspect may require reading changelog for breaking changes
      • Impermance:Inspried by the Erase your Darlings post, Servers run zfs and rollback to a blank snapshot at night. This ensures repeatable NixOS deployments and no cruft, and also hardens servers a little.
      • SystemD Notifications:Systemd hook that adds a pushover notification to any systemd unit failure for any unit NixOS is aware of. No worrying about forgetting to add a notification to every new service or worrying about missing one.

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"overview/goals/","title":"Goals","text":"

      When I set about making this lab I had a number of goals - I wonder how well I will do ?

      A master list of ideas/goals/etc can be found at Issue #1

      • Stability NixOS stable channel for core services unstable for desktop apps/non-mission critical where desired. Containers with SHA256 pinning for server apps
      • KISSKeep it Simple, use boring, reliable, trusted tools - not todays flashy new software repo
      • Easy UpdatesWeekly update schedule, utilizing Renovate for updating lockfile and container images. Autoupdates enabled off main branch for mission critical. Aim for 'magic rollback' on upgrade failure
      • BackupsNightly restic backups to both cloud and NAS. All databases to have nightly backups. Test backups regulary
      • ReproducabilityFlakes & Git for version pinning, SHA256 tags for containers.
      • MonitoringAutomated monitoring on failure & critical summaries, using basic tools. Use Gatus for both internal and external monitoring
      • Continuous IntegrationCI against main branch to ensure all code compiles OK. Use PR's to add to main and dont skip CI due to impatience
      • SecurityDont use containers with S6 overlay/root (i.e. LSIO ). Expose minimal ports at router, Reduce attack surface by keeping it simple, review hardening containers/podman/NixOS
      • Ease of administrationLean into the devil that is SystemD - and have one standard interface to see logs, manipulate services, etc. Run containers as podman services, and webui's for watching/debugging
      • Secrets ssshh..Sops-nix for secrets, living in my gitrepo. Avoid cloud services like I used in k8s (i.e. Doppler.io)

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"overview/k8s/","title":"K8s","text":"

      Removed complexity

      • external secrets -> bog standard sops
      • HA file storage -> standard file system
      • HA database cluster -> nixos standard cluster
      • Database user operator -> nixos standard ensure_users
      • Database permissions operator -> why even??
      • secrets reloader -> sops restart_unit
      • easier managment, all services run through systemd for consistency, cockpit makes viewing logs/pod console etc easy.

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"overview/options/","title":"Options","text":"

      Explain mySystem and myHome

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"overview/structure/","title":"Repository Structure","text":"

      Note

      Oh god writing this now is a horrid idea, I always refactor like 50 times...

      Here is a bit of a walkthrough of the repository structure so you I can have a vague idea on what is going on. Organizing a monorepo is hard at the best of times.

      \u251c\u2500\u2500 .github\n\u2502   \u251c\u2500\u2500 renovate            Renovate modules\n\u2502   \u251c\u2500\u2500 workflows             Github Action workflows (i.e. CI/Site building)\n\u2502   \u2514\u2500\u2500 renovate.json5        Renovate core settings\n\u251c\u2500\u2500 .taskfiles              go-task file modules\n\u251c\u2500\u2500 docs                    This mkdocs-material site\n\u2502   nixos                   Nixos Modules\n\u2502   \u2514\u2500\u2500 home                  home-manager nix files\n\u2502       \u251c\u2500\u2500 modules             home-manager modules\n\u2502       \u2514\u2500\u2500 truxnell            home-manager user\n\u2502   \u251c\u2500\u2500 hosts                 hosts for nix - starting point of configs.\n\u2502   \u251c\u2500\u2500 modules               nix modules\n\u2502   \u251c\u2500\u2500 overlays              nixpkgs overlays\n\u2502   \u251c\u2500\u2500 pkgs                  custom nix packages\n\u2502   \u2514\u2500\u2500 profiles              host profiles\n\u251c\u2500\u2500 README.md               Github Repo landing page\n\u251c\u2500\u2500 flake.nix               Core flake\n\u251c\u2500\u2500 flake.lock              Lockfile\n\u251c\u2500\u2500 LICENSE                 Project License\n\u251c\u2500\u2500 mkdocs.yml              mkdocs settings\n\u2514\u2500\u2500 Taskfile.yaml           go-task core file\n

      Whew that wasnt so hard right... right?

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"security/containers/","title":"Containers","text":""},{"location":"security/containers/#container-images","title":"Container images","text":"

      Dont use LSIO!

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"vm/faq/","title":"Faq","text":""},{"location":"vm/faq/#why-not-recurse-the-module-folder","title":"Why not recurse the module folder","text":"

      Imports are special in NIX and its important that they are defined at runtime for lazy evaluation - if you do optional/coded imports not everything is available for evaluating.

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"vm/impermance/","title":"Impermance","text":"
      • need to save ssh keys on reboot
      • else you end up with sops issues & ssh known_key changes every reboot
      • need to sort out password

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"vm/installing-x86_64/","title":"Installing x86 64","text":""},{"location":"vm/installing-x86_64/#installing-a-playground-vm","title":"Installing a playground VM","text":"

      I've used gnome-boxes from my current Fedora laptop for running playground vm's.

      Settings: ISO: nixos-minimal Hard drive: 32GB RAM: 2GB EFI: Enable

      Expose port 22 to allow ssh into vm (host port 3022, guest 22)

      # set temp root passwd\nsudo su\npasswd\n

      sshd is already running, so you can now ssh into the vm remotely for the rest of the setup. ssh root@127.0.0.1 -p 3022

      # Partitioning\nparted /dev/sda -- mklabel gpt\nparted /dev/sda -- mkpart root ext4 512MB -8GB\nparted /dev/sda -- mkpart swap linux-swap -8GB 100%\nparted /dev/sda -- mkpart ESP fat32 1MB 512MB\nparted /dev/sda -- set 3 esp on\n\n# Formatting\nmkfs.ext4 -L nixos /dev/sda1\nmkswap -L swap /dev/sda2\nmkfs.fat -F 32 -n boot /dev/sda3\n\n# Mounting disks for installation\nmount /dev/disk/by-label/nixos /mnt\nmkdir -p /mnt/boot\nmount /dev/disk/by-label/boot /mnt/boot\nswapon /dev/sda2\n\n# Generating default configuration\nnixos-generate-config --root /mnt\n

      From this config copy the bootstrap configuration and fetch the hardware configuration.

      scp -P 3022 nixos/hosts/bootstrap/configuration.nix root@127.0.0.1:/mnt/etc/nixos/configuration.nix\nscp -P 3022 root@127.0.0.1:/mnt/etc/nixos/hardware-configuration.nix nixos/hosts/nixosvm/hardware-configuration.nix\n

      Then back to the VM

      nixos-install\nreboot\nnixos-rebuild switch\n

      Set the password for the user that was created. Might need to use su?

      passwd truxnell\n

      Also grab the ssh keys and re-encrypt sops

      cat /etc/ssh/ssh_host_ed25519_key.pub | ssh-to-age\n

      then run task

      Login as user, copy nix git OR for remote machines/servers just nixos-install --impure --flake github:truxnell/nix-config#<MACHINE_ID>

      mkdir .local\ncd .local\ngit clone https://github.com/truxnell/nix-config.git\ncd nix-config\n

      Apply config to bootstrapped device First time around, MUST APPLY with name of host in ./hosts/ This is because .. --flake . looks for a nixosConfigurations key with the machines hostname The bootstrap machine will be called 'nixos-bootstrap' so the flake by default would resolve nixosConfigurations.nixos-bootstrap Subsequent rebuilds can be called with the default command as after first build the machines hostname will be changed to the desired machine

      nixos-rebuild switch --flake .#<machinename>\n

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"vm/installing-zfs-impermance/","title":"Installing zfs impermance","text":"

      https://grahamc.com/blog/erase-your-darlings/

      "},{"location":"vm/installing-zfs-impermance/#get-hostid","title":"Get hostid","text":"

      run head -c 8 /etc/machine-id and copy into networking.hostId to ensure ZFS doesnt get borked on reboot

      "},{"location":"vm/installing-zfs-impermance/#partitioning","title":"Partitioning","text":"

      parted /dev/nvme0n1 -- mklabel gpt parted /dev/nvme0n1 -- mkpart root ext4 512MB -8GB parted /dev/nvme0n1 -- mkpart swap linux-swap -8GB 100% parted /dev/nvme0n1 -- mkpart ESP fat32 1MB 512MB parted /dev/nvme0n1 -- set 3 esp on

      "},{"location":"vm/installing-zfs-impermance/#formatting","title":"Formatting","text":"

      mkswap -L swap /dev/nvme0n1p2 swapon /dev/nvme0n1p2 mkfs.fat -F 32 -n boot /dev/nvme0n1p3

      "},{"location":"vm/installing-zfs-impermance/#zfs-on-root-partition","title":"ZFS on root partition","text":"

      zpool create -O mountpoint=none rpool /dev/nvme0n1p1

      zfs create -p -o mountpoint=legacy rpool/local/root

      "},{"location":"vm/installing-zfs-impermance/#immediate-blank-snapshot","title":"immediate blank snapshot","text":"

      zfs snapshot rpool/local/root@blank mount -t zfs rpool/local/root /mnt

      "},{"location":"vm/installing-zfs-impermance/#boot-partition","title":"Boot partition","text":"

      mkdir /mnt/boot mount /dev/nvme0n1p3 /mnt/boot

      "},{"location":"vm/installing-zfs-impermance/#mk-nix","title":"mk nix","text":"

      zfs create -p -o mountpoint=legacy rpool/local/nix mkdir /mnt/nix mount -t zfs rpool/local/nix /mnt/nix

      "},{"location":"vm/installing-zfs-impermance/#and-a-dataset-for-home-if-needed","title":"And a dataset for /home: if needed","text":"

      zfs create -p -o mountpoint=legacy rpool/safe/home mkdir /mnt/home mount -t zfs rpool/safe/home /mnt/home

      zfs create -p -o mountpoint=legacy rpool/safe/persist mkdir /mnt/persist mount -t zfs rpool/safe/persist /mnt/persist

      Set networking.hostid`` in the nixos config tohead -c 8 /etc/machine-id`

      nixos-install --impure --flake github:truxnell/nix-config#<MACHINE_ID>\n

      consider a nixos-enter to import a zpool if required (for NAS) instead of rebooting post-install

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "},{"location":"vm/secrets/","title":"Generate age key per machine","text":"

      On new machine, run below to transfer its shiny new ed25519 to age

      nix-shell -p ssh-to-age --run 'cat /etc/ssh/ssh_host_ed25519_key.pub | ssh-to-age'\n

      Copy this into ./.sops.yaml in base repo, then re-run taskfile task sops:re-encrypt to loop through all sops keys, decrypt then re-encrypt

      [CI]: Continuous Integration [PR]: Pull Request [HASS]: Home-assistant [k8s]: Kubernetes [YAML]: Yet Another Markup Language [JSON]: JavaScript Object Notation [ZFS]: Originally 'Zettabyte File System', a COW filesystem. [COW]: Copy on Write

      "}]} \ No newline at end of file diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 268289f0..1ec71298 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -2,127 +2,132 @@ https://truxnell.github.io/nix-config/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/motd/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/tips/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/administration/cockpit/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/administration/deployment/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/administration/taskfile/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/includes/abbreviations/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/maintenance/backups/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/maintenance/software_updates/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/monitoring/systemd/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/monitoring/warnings/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/monitoring/zed/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/network/dns/ - 2024-04-18 + 2024-04-20 + daily + + + https://truxnell.github.io/nix-config/network/dns_dhcp/ + 2024-04-20 daily https://truxnell.github.io/nix-config/overview/design/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/overview/features/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/overview/goals/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/overview/k8s/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/overview/options/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/overview/structure/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/security/containers/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/vm/faq/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/vm/impermance/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/vm/installing-x86_64/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/vm/installing-zfs-impermance/ - 2024-04-18 + 2024-04-20 daily https://truxnell.github.io/nix-config/vm/secrets/ - 2024-04-18 + 2024-04-20 daily \ No newline at end of file diff --git a/docs/sitemap.xml.gz b/docs/sitemap.xml.gz index 0ac0182b6b2ff5915a335a3eb1ad5433fa87a3df..7f1de4cfbfe958ef6146132888f7efb3544c9e71 100644 GIT binary patch literal 455 zcmV;&0XY62iwFq)r6Xnn|8r?{Wo=<_E_iKh0M*ycj+-zL0O0#RMZ_JbM^$a1|r>~P;MLG9S*}fR8!TL1v&(OVJtl69)5_)&2UYoYsfKy(h8xPf= z-@od8_1?d9OEd($Nx_Z|)x;cr3bC$hL&#udpvK2hkebp9iSI$tHb1KVWs}-%g0H80 z>x@_ZZ5B$?a9l19jA=&XX*o6Mg~Lm&yc=*7ow&t5e=3Qs*ZsEL?&@||Z`%~|J@yo) zCh0~PB7F2(qPceBY2?c!9;kokS|6bR6L#$)^zz1j;4r|8r?{Wo=<_E_iKh0M(a4PxCMghVT4}srNPmO=utL%8g$D zXOtywXEkv`?5-QXo@^kr^LIc^E@>^-di2Wa>$J}&?^Z`hjDdF5eY37sKqgqEVOPDq zzSP_5xw~tp?X|| zG!4)v_~@)gcW%{9VL^jM{o6WJy|bCn^)n7*PtNwh&!7d!W+5 zMf2FBgQa6588=yArT6;;N4(N6zPG(b5+F?m)Lxr$ibWxF=w;Rk-lu4_!1CDxp@ftj zO!1ks6WM}@v(S|Hl0^Q5#S2Uzb4a7I%l}z~Jd7O?=4>X60~LeYgO({lIrfu{giV$sfJ n_pMLc$M?rgS+7^M3GM5|mknGpa0O<3DG