Skip to content

Commit

Permalink
Feat: docs (#98)
Browse files Browse the repository at this point in the history
* hacking at dns

* hack

* hax

* start dics!

* hacking

* feat: docs!

---------

Co-authored-by: Truxnell <9149206+truxnell@users.noreply.github.com>
  • Loading branch information
truxnell and yunmanzr authored Apr 16, 2024
1 parent 80e008a commit ccd8e80
Show file tree
Hide file tree
Showing 83 changed files with 1,616 additions and 707 deletions.
1 change: 1 addition & 0 deletions .github/renovate.json5
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
"lockFileMaintenance": {
"enabled": "true",
"automerge": "true",
"schedule": [ "before 4am on Sunday" ],
},

"regexManagers": [
Expand Down
12 changes: 8 additions & 4 deletions .github/renovate/autoMerge.json5
Original file line number Diff line number Diff line change
@@ -1,29 +1,33 @@
{

// auto update up to major
"packageRules": [
{
// auto update up to major
"matchDatasources": ['docker'],
"automerge": "true",
"automergeType": "branch",
"schedule": [ "before 4am on Sunday" ],
"matchUpdateTypes": [ 'minor', 'patch', 'digest'],
"matchPackageNames": [
'ghcr.io/onedr0p/sonarr',
'ghcr.io/onedr0p/readarr',
'ghcr.io/onedr0p/radarr',
'ghcr.io/onedr0p/lidarr',
'ghcr.io/onedr0p/prowlarr',
'ghcr.io/twin/gatus',
'ghcr.io/onedr0p/prowlarr'
],

},
// auto update up to minor
{
"matchDatasources": ['docker'],
"automerge": "true",
"automergeType": "branch",
"schedule": [ "before 4am on Sunday" ],
"matchUpdateTypes": [ 'patch', 'digest'],
"matchPackageNames": [
'ghcr.io/twin/gatus',
"ghcr.io/gethomepage/homepage",
],
]

},
{
Expand Down
55 changes: 55 additions & 0 deletions .github/workflows/docs-release.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
name: "Docs: Release to GitHub pages"

on:
workflow_dispatch:
push:
branches:
- main
paths:
- ".github/workflows/docs-release.yaml"
- ".mkdocs.yml"
- "docs/**"

permissions:
contents: write

jobs:
release-docs:
name: Release documentation
runs-on: ubuntu-22.04
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
steps:
- name: "Generate Short Lived OAuth App Token (ghs_*)"
uses: actions/create-github-app-token@v1.9.3
id: app-token
with:
app-id: "${{ secrets.TRUXNELL_APP_ID }}"
private-key: "${{ secrets.TRUXNELL_APP_PRIVATE_KEY }}"

- name: Checkout main branch
uses: actions/checkout@v4
with:
token: ${{ steps.app-token.outputs.token }}
fetch-depth: 0

- uses: actions/setup-python@v5
with:
python-version: 3.x

- name: Install requirements
run: pip install -r docs/requirements.txt

- name: Build and publish docs
run: mkdocs build -f mkdocs.yml

- name: Deploy
uses: peaceiris/actions-gh-pages@v4.0.0
if: ${{ github.ref == 'refs/heads/main' }}
with:
github_token: ${{ steps.app-token.outputs.token }}
publish_dir: ./site
destination_dir: docs
user_name: "Trux-Bot[bot]"
user_email: "Trux-Bot[bot] <19149206+trux-bot[bot]@users.noreply.github.com>"
2 changes: 1 addition & 1 deletion .sops.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ keys:
- &citadel age1u4tht685sqg6dkmjyer96r93pl425u6353md6fphpd84jh3jwcusvm7mgk
- &rickenbacker age1cp6vegrmqfkuj8nmt2u3z0sur7n0f7e9x9zmdv4zygp8j2pnucpsdkgagc
- &shodan age1j2r8mypw44uvqhfs53424h6fu2rkr5m7asl7rl3zn3xzva9m3dcqpa97gw
- &daedalus age1w3jtd4lecn2ng8qxantw33qxl2uasfqfjfpx45u6uweexwtxyq4spwssmh
- &daedalus age1jpeh4s553taxkyxhzlshzqjfrtvmmp5lw0hmpgn3mdnmgzku332qe082dl

creation_rules:
- path_regex: .*\.sops\.yaml$
Expand Down
6 changes: 6 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"cSpell.words": [
"homelab",
"Seafile"
]
}
33 changes: 6 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,37 +38,16 @@ To Install
- [ ] Bring over hosts
- [x] DNS01 Raspi4
- [x] DNS02 Raspi4
- [ ] NAS
- [x] NAS
- [x] Latop
- [x] Gaming desktop
- [ ] WSL
- [ ] JJY emulator Raspi4
- [ ] Documentation!
- [ ] ssh_config build from computers?
- [ ] Modularise host to allow vm builds and hw builds
- [ ] Add license
- [ ] Add taskfiles

## Network map

TBC

## Hardware

TBC

## Manifesto

Taking lead from the zen of python:

- Minimise dependencies, where required, explicitly define dependencies
- Use plain nix to solve problems over additional tooling
- Stable channel for stable machines. Unstable only where features are important.
- Modules for a specific service - Profiles for broad configuration of state.
- Write readable code - descriptive variable names and modules
- Keep functions/dependencies within the relevant module where possible
- Errors should never pass silently - use assert etc for misconfigurations
- Flat is better than nested - use built-in functions like map, filter, and fold to operate on lists or sets
- [x] Documentation!
- [x] ssh_config build from computers?
- [x] Modularise host to allow vm builds and hw builds
- [x] Add license
- [x] Add taskfiles

## Checklist

Expand Down
Empty file added docs/administration/cockpit.md
Empty file.
Empty file.
Empty file added docs/administration/taskfile.md
Empty file.
8 changes: 8 additions & 0 deletions docs/includes/abbreviations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
*[CI]: Continuous Integration
*[PR]: Pull Request
*[HASS]: Home-assistant
*[k8s]: Kubernetes
*[YAML]: Yet Another Markup Language
*[JSON]: JavaScript Object Notation
*[ZFS]: Originally 'Zettabyte File System', a COW filesystem.
*[COW]: Copy on Write
Binary file added docs/includes/assets/ci-checks-garnix.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/includes/assets/ci-checks.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/includes/assets/home-cluster-pr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/includes/assets/no-backup-warning.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/includes/assets/pushover-failed-backup.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/includes/assets/renovate-pr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 19 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
👋 Welcome to my NixoOS home and homelab configuration. This monorepo is my personal :simple-nixos: nix/nixos setup for all my devices, specifically my homelab.

This is the end result of a recovering :simple-kubernetes: k8s addict - who no longer enjoyed the time and effort I **personally** found it took to run k8s at home.

## Why?

Having needed a break from hobby's for some health related reasons, I found coming back to a unpatched cluster a chore, which was left unattented. Then a cheap SSD in my custom VyOS router blew, leading me to just put back in my Unifi Dreammachine router, which broke the custom DNS I was running for my cluster, which caused it issues.

During fixing the DNS issue, a basic software upgrade for the custom k8s OS I was running k8s on broke my cluster for the 6th time running, coupled with using a older version of the script tool I used to manage its machine config yaml, which ended up leading to my 6th k8s disaster recovery :octicons-info-16:{ title="No I don't want to talk about it" }).

Looking at my boring :simple-ubuntu: Ubuntu ZFS nas which just ran and ran and ran without needing TLC, and remembering the old days with Ubuntu + Docker Compose being hands-off :octicons-info-16:{ title="Too much hands off really as I auto-updated everything, but I digress" }), I dove into nix, with the idea of getting back to basics of boring proven tools, with the power of nix's declarative system.

## Goals

One of my goals is to bring what I learnt running k8s at home with some of the best homelabbers, into the nix world and see just how much of the practices I learnt I can apply to a nix setup, while focussing on having a solid, reliable, setup that I can leave largely unattended for months without issues cropping up.

The goal of this doc is for me to slow down a bit and jot down how and why I am doing what im doing in a module, and cover how I have approached the faucets of homelabbing, so **YOU** can understand, steal with pride from my code, and hopefully(?) learn a thing or two.

To _teach me_ a thing or two, contact me or raise a Issue. PR's may or may not be taken as a personal attack - this is my home setup after all.
109 changes: 109 additions & 0 deletions docs/maintenance/backups.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Backups

Nightly Backups are facilitated by NixOS's module for [restic](https://search.nixos.org/options?channel=23.11&from=0&size=50&sort=relevance&type=packages&query=services.restic.) module and a helper module ive written.

This does a nightly ZFS snapshot, in which apps and other mutable data is restic backed up to both a local folder on my NAS and also to Cloudflare R2 :octicons-info-16:{ title="R2 mainly due to the cheap cost and low egrees fees" }). Backing up from a ZFS snapshot ensures that the restic backup is consistent, as backing up files in use (especially a sqlite database) will cause corruption. Here, all restic jobs are backing up as per the 2.05 snapshot, regardless of when they run that night.

Another benefit of this approach is that it is service agnostic - containers, nixos services, qemu, whatever all have files in the same place on the filesystem (in the persistant folder) so they can all be backed up in the same fashion.

The alternative is to shutdown services during backup (which could be facilitaed with the restic backup pre/post scripts) but ZFS snapshots are a godsend in this area, and im already running them for impermanence.

!!! info "Backing up without snapshots/shutdowns?"

This is a pattern I see a bit too - if you are backing up files raw without stopping your service beforehand you might want to check to ensure your backups aren't corrupted.

The timeline then is:

| time | activity |
| ------------- | -------------------------------------------------------------------------------------------------------------------------------- |
| 02.00 | ZFS deletes prior snapshot and creates new one, to `rpool/safe/persist@restic_nightly_snap` |
| 02.05 - 04.05 | Restic backs up from new snapshot's hidden read-only mount `.zfs` with random delays per-service - to local and remote locations |

## Automatic Backups

I have added a sops secret for both my local and remote servers in my restic module :simple-github: [/nixos/modules/nixos/services/restic/](https://github.com/truxnell/nix-config/blob/main/nixos/modules/nixos/services/restic/default.nix). This provides the restic password and 'AWS' credentials for the S3-compatible R2 bucket.

Backups are created per-service in each services module. This is largely done with a `lib` helper ive written, which creates both the relevant restic backup local and remote entries in my nixosConfiguration.
:simple-github: [nixos/modules/nixos/lib.nix](https://github.com/truxnell/nix-config/blob/main/nixos/modules/nixos/lib.nix)
!!! question "Why not backup the entire persist in one hit?"

Possibly a hold over from my k8s days, but its incredibly useful to be able to restore per-service, especially if you just want to move an app around or restore one app. You can always restore multiple repos with a script/taskfile.

NixOS will create a service + timer for each job - below shows the output for a prowlarr local/remote backup.

```bash
truxnell@daedalus ~> systemctl list-unit-files | grep restic-backups-prowlarr
restic-backups-prowlarr-local.service linked enabled
restic-backups-prowlarr-remote.service linked enabled
restic-backups-prowlarr-local.timer enabled enabled
restic-backups-prowlarr-remote.timer enabled enabled
```

NixOS (as of 23.05 IIRC) now provides shims to enable easy access to the restic commands with the correct env vars mounted same as the service.

```bash
truxnell@daedalus ~ [1]> sudo restic-prowlarr-local snapshots
repository 9d9bf357 opened (version 2, compression level auto)
ID Time Host Tags Paths
---------------------------------------------------------------------------------------------------------------------
293dad23 2024-04-15 19:24:37 daedalus /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr
24938fe8 2024-04-16 12:42:50 daedalus /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr
---------------------------------------------------------------------------------------------------------------------
2 snapshots
```

## Manually backing up

They are a systemd timer/service so you can query or trigger a manual run with `systemctl start restic-backups-<service>-<destination>` Local and remote work and function exactly the same, querying remote it just a fraction slower to return information.

```bash
truxnell@daedalus ~ > sudo systemctl start restic-backups-prowlarr-local.service
< no output >
truxnell@daedalus ~ [1]> sudo restic-prowlarr-local snapshots
repository 9d9bf357 opened (version 2, compression level auto)
ID Time Host Tags Paths
---------------------------------------------------------------------------------------------------------------------
293dad23 2024-04-15 19:24:37 daedalus /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr
24938fe8 2024-04-16 12:42:50 daedalus /persist/.zfs/snapshot/restic_nightly_snap/containers/prowlarr
---------------------------------------------------------------------------------------------------------------------
2 snapshots
truxnell@daedalus ~> date
Tue Apr 16 12:43:20 AEST 2024
truxnell@daedalus ~>
```

## Restoring a backup

Testing a restore (would do --target / for a real restore)
Would just have to pause service, run restore, then re-start service.

```bash
truxnell@daedalus ~ [1]> sudo restic-lidarr-local restore --target /tmp/lidarr/ latest
repository a2847581 opened (version 2, compression level auto)
[0:00] 100.00% 2 / 2 index files loaded
restoring <Snapshot b96f4b94 of [/persist/nixos/lidarr] at 2024-04-14 04:19:41.533770692 +1000 AEST by root@daedalus> to /tmp/lidarr/
Summary: Restored 52581 files/dirs (11.025 GiB) in 1:37
```

## Failed backup notifications

Failed backup notifications are baked-in due to the global Pushover notification on SystemD unit falure. No config nessecary

Here I tested it by giving the systemd unit file a incorrect path.

<figure markdown="span">
![Screenshot of a pushover notification of a failed backup](../includes/assets/pushover-failed-backup.png)
<figcaption>A deliberately failed backup to test notifications, hopefully I don't see a real one.</figcaption>
</figure>

## Disabled backup warnings

Using [module warnings](https://nlewo.github.io/nixos-manual-sphinx/development/assertions.xml.html), I have also put in warnings into my NixOS modules if I have disabled a warning on a host _that isnt_ a development machine, just in case I do this or mixup flags on hosts. Roll your eyes, I will probably do it.
This will pop up when I do a dry run/deployment - but not abort the build.

<figure markdown="span">

![Screenshoft of nixos warning of disabled backups](../includes/assets/no-backup-warning.png)

<figcaption>It is eye catching thankfully</figcaption>
</figure>
Loading

0 comments on commit ccd8e80

Please sign in to comment.