From 53ffcdac98f3f170760436ce14dbd473e4285bb4 Mon Sep 17 00:00:00 2001 From: Amolith Date: Fri, 20 Mar 2026 08:51:01 -0600 Subject: [PATCH] feat: add keld, init, and munin skills --- README.md | 73 +++++- skills/backing-up-with-keld/SKILL.md | 243 ++++++++++++++++++ .../references/config-examples.md | 193 ++++++++++++++ .../references/installation.md | 1 + .../SKILL.md | 71 +++++ skills/monitoring-with-munin/SKILL.md | 203 +++++++++++++++ .../references/writing-plugins.md | 159 ++++++++++++ 7 files changed, 937 insertions(+), 6 deletions(-) create mode 100644 skills/backing-up-with-keld/SKILL.md create mode 100644 skills/backing-up-with-keld/references/config-examples.md create mode 100644 skills/backing-up-with-keld/references/installation.md create mode 100644 skills/initialising-and-updating-agents-md/SKILL.md create mode 100644 skills/monitoring-with-munin/SKILL.md create mode 100644 skills/monitoring-with-munin/references/writing-plugins.md diff --git a/README.md b/README.md index bbfc63d734a5618e900c7254db642de15fb9713a..2067f9d349c22d0558709883cfe85bee6085ad18 100644 --- a/README.md +++ b/README.md @@ -33,6 +33,10 @@ token count, plus overall metadata usage. I've used and tested them most with - [authoring-skills](skills/authoring-skills/SKILL.md): Creates and reviews Agent Skills following best practices. Covers skill structure, frontmatter, and progressive disclosure patterns. +- [backing-up-with-keld](skills/backing-up-with-keld/SKILL.md): Writes and + manages keld configuration for restic backups. Covers TOML preset structure, + split preset composition (`home@cloud`), config file discovery, environment + variables, and systemd timer setup. - [collaborating-through-pr-pico-sh](skills/collaborating-through-pr-pico-sh/SKILL.md): Collaborates on git patches via [pr.pico.sh], a minimal patchbin service. Covers both contributing and reviewing patch requests using `git format-patch` @@ -53,6 +57,10 @@ token count, plus overall metadata usage. I've used and tested them most with - [humanizer](skills/humanizer/SKILL.md): Removes AI-generated patterns from text like promotional fluff, weasel words, and mechanical sentence structures. Based on Wikipedia's AI Cleanup research. Originally from [blader/humanizer]. +- [initialising-and-updating-agents-md](skills/initialising-and-updating-agents-md/SKILL.md): + Analyses a codebase and creates or updates `AGENTS.md` to help future agents + work effectively. Discovers commands, conventions, patterns, and gotchas from + the project's source and config files. - [invoking-subagents](skills/invoking-subagents/SKILL.md): Spawns subagents with restricted tool access for parallel tasks across repositories. Requires [synu] and the `claude` CLI. Useful for summarizing git history or processing @@ -60,6 +68,9 @@ token count, plus overall metadata usage. I've used and tested them most with - [managing-and-navigating-worktrees](skills/managing-and-navigating-worktrees/SKILL.md): Manages git worktrees using [wt] with a bare repository structure. Each branch lives in its own sibling directory. Requires [wt], git, and [gum]. +- [monitoring-with-munin](skills/monitoring-with-munin/SKILL.md): Deploys and + manages Munin monitoring across servers. Sets up munin-node on hosts, writes + plugins, configures masters, and handles alerts. - [notifying-through-ntfy](skills/notifying-through-ntfy/SKILL.md): Sends push notifications via [ntfy.sh] when requested, such as at the end of its turn. - [rebasing-with-git](skills/rebasing-with-git/SKILL.md): Manages git rebase @@ -257,6 +268,18 @@ Token breakdown: ─────────────────────────────────────────────── Total: 3706 tokens +=== backing-up-with-keld === + +Token breakdown: + Name: 9 tokens + Description: 39 tokens + Body: 1742 tokens (235 lines) + References: + config-examples.md 1415 tokens + installation.md 19 tokens + ─────────────────────────────────────────────── + Total: 3224 tokens + === collaborating-through-pr-pico-sh === Token breakdown: @@ -324,6 +347,15 @@ Token breakdown: ─────────────────────────────────────────────── Total: 3311 tokens +=== initialising-and-updating-agents-md === + +Token breakdown: + Name: 12 tokens + Description: 59 tokens + Body: 850 tokens (63 lines) + ─────────────────────────────────────────────── + Total: 921 tokens + === invoking-subagents === Token breakdown: @@ -342,6 +374,17 @@ Token breakdown: ─────────────────────────────────────────────── Total: 774 tokens +=== monitoring-with-munin === + +Token breakdown: + Name: 10 tokens + Description: 62 tokens + Body: 1553 tokens (195 lines) + References: + writing-plugins.md 1151 tokens + ─────────────────────────────────────────────── + Total: 2776 tokens + === notifying-through-ntfy === Token breakdown: @@ -418,14 +461,32 @@ Token breakdown: ─────────────────────────────────────────────── Total: 4721 tokens +=== updating-llm-client-model-lists === + +Token breakdown: + Name: 13 tokens + Description: 51 tokens + Body: 997 tokens (126 lines) + ─────────────────────────────────────────────── + Total: 1061 tokens + +=== using-exe-dev === + +Token breakdown: + Name: 8 tokens + Description: 32 tokens + Body: 380 tokens (45 lines) + ─────────────────────────────────────────────── + Total: 420 tokens + === working-with-tmux === Token breakdown: Name: 8 tokens Description: 32 tokens - Body: 544 tokens (87 lines) + Body: 548 tokens (88 lines) ─────────────────────────────────────────────── - Total: 584 tokens + Total: 588 tokens === writing-git-tags === @@ -466,10 +527,10 @@ Token breakdown: SUMMARY ============================================================ -Skills: 23 -Metadata: 1304 tokens -Combined bodies: 23025 tokens -Overall: 74023 tokens +Skills: 28 +Metadata: 1599 tokens +Combined bodies: 28551 tokens +Overall: 82429 tokens Validation errors: 0 Largest skills (by total tokens): diff --git a/skills/backing-up-with-keld/SKILL.md b/skills/backing-up-with-keld/SKILL.md new file mode 100644 index 0000000000000000000000000000000000000000..e25f89ce55aa6e9bc22b77cacbdf794b6b1a68bd --- /dev/null +++ b/skills/backing-up-with-keld/SKILL.md @@ -0,0 +1,243 @@ +--- +name: backing-up-with-keld +description: Writes and manages keld configuration for restic backups. Use when the user mentions keld, backup presets, restic config, or needs help writing TOML config for backups. +license: GPL-3.0-or-later +metadata: + author: Amolith +--- + +Keld is a TOML-configured wrapper around [restic](https://restic.net/). It resolves layered config presets and exec's restic with the merged result. + +## Quick reference + +```bash +# Run with a preset +keld --preset home@nas backup + +# Dry-run: show what restic command would execute +keld --show-command --preset home@nas backup + +# Interactive mode (TUI menu) +keld + +# Override restic flags after the command +keld --preset home backup --tag daily --exclude '*.tmp' + +# Override configured backup paths +keld --preset home backup /other/path +``` + +## Config file discovery + +Files are loaded in ascending priority order (later wins): + +1. `/usr/share/keld/config.toml`, then sorted `/usr/share/keld/conf.d/*.toml` +2. `/etc/keld/config.toml`, then sorted `/etc/keld/conf.d/*.toml` +3. `~/.config/keld/config.toml`, then sorted `~/.config/keld/conf.d/*.toml` +4. Paths/globs from `KELD_CONFIG_PATHS` (colon-separated) +5. `KELD_CONFIG_FILE` replaces **all** of the above if set + +## Config structure + +### Section merge order + +For `keld --preset home@cloud backup`, sections merge in this order: + +``` +[global] → [global.backup] → [@cloud] → [@cloud.backup] → +[home@] → [home@.backup] → [home@cloud] → [home@cloud.backup] → +CLI overrides (replace, not append) +``` + +Plain presets (no `@`) are simpler: `[global] → [global.backup] → [mypreset] → [mypreset.backup]`. + +CLI flags and positional args **replace** their config counterparts — they do not merge with arrays or `_arguments`. + +### Split presets + +The `@` in a preset name composes two independent halves: + +- **Prefix** (`home@`): the _what_ — defines sources, excludes, tags +- **Suffix** (`@cloud`): the _where_ — defines repository, credentials + +This lets you mix and match: `home@cloud`, `home@nas`, `media@cloud` all reuse their respective halves. + +### Special keys + +| Key | Purpose | +| ------------ | ------------------------------------------------------------------ | +| `_arguments` | Positional args for restic (prefer arrays; string form is whitespace-split) | +| `_workdir` | Directory to chdir before exec | +| `_command` | Restic subcommand (allows aliasing) | +| `*.environ` | Section suffix for environment variables | + +### Interpolation + +Regular config values (not `.environ`) can reference other sections with `${section.key}`: + +```toml +[vars] +cache-root = "~/.cache/keld" + +[global] +cache-dir = "${vars.cache-root}/restic" +``` + +## Writing config + +### Minimal example + +```toml +["@nas"] +repository = "sftp:nas:/backups/restic" + +["home@".environ] +RESTIC_PASSWORD_COMMAND = "op read 'op://Vault/Backup/password'" + +["home@".backup] +_arguments = ["/home/user/Documents", "/home/user/Projects"] +exclude-if-present = ".nobackup" +``` + +Invoke with: `keld --preset home@nas backup` + +### Repository paths within a backend + +Restic supports paths inside rclone remotes and S3 buckets, so one +remote or bucket can hold multiple independent repositories. Use the +full preset section to set the path: + +```toml +# Shared S3/B2 credentials — one suffix for the whole account +["@b2".environ] +AWS_ACCESS_KEY_ID = "your-key-id" +AWS_SECRET_ACCESS_KEY = "your-secret" + +# Each dataset gets its own path within the bucket +["media@b2"] +repository = "s3:https://s3.example.com/my-bucket/media" + +["docs@b2"] +repository = "s3:https://s3.example.com/my-bucket/docs" +``` + +The same works for rclone — one remote, different paths: + +```toml +["media@hetzner"] +repository = "rclone:hetzner_restic:/media" + +["docs@hetzner"] +repository = "rclone:hetzner_restic:/docs" +``` + +Services like BorgBase provide per-repository URLs with embedded +credentials, so each repo is specific to one dataset. Use a split preset +where the suffix is unique to that dataset (there's no shared suffix to +reuse): + +```toml +["media@borgbase_media"] +repository = "rest:https://user:pass@user.repo.borgbase.com" +``` + +### Multi-value flags + +Use TOML arrays for flags that accept multiple values: + +```toml +["home@".backup] +_arguments = ["/home/user/Documents", "/home/user/Projects"] +exclude = ["*.tmp", "node_modules", ".git"] +tag = ["daily", "home"] +``` + +Multi-line strings also work — each line becomes a separate flag value: + +```toml +exclude = """ +*.tmp +node_modules +.git""" +``` + +### Environment variables + +Use the `.environ` suffix on any section to set env vars for restic: + +```toml +["media@".environ] +RESTIC_PASSWORD_COMMAND = "op read 'op://Vault/Media Backup/password'" + +["@b2".environ] +AWS_ACCESS_KEY_ID = "your-key-id" +AWS_SECRET_ACCESS_KEY = "your-secret" +``` + +#### Fetching secrets with `_COMMAND` + +For env vars that restic doesn't natively support fetching via command +(anything other than `RESTIC_PASSWORD_COMMAND`), keld recognises a +`_COMMAND` suffix. The value is executed as a shell command, stdout is +captured (trailing newlines stripped), and the result is set as the env +var with the suffix removed: + +```toml +["@b2".environ] +AWS_ACCESS_KEY_ID_COMMAND = "op read 'op://Vault/B2/access-key-id'" +AWS_SECRET_ACCESS_KEY_COMMAND = "op read 'op://Vault/B2/secret-access-key'" +``` + +This keeps secrets out of config files entirely. Any command that prints +the secret to stdout works (1Password CLI, `pass`, `vault`, etc.). + +`RESTIC_PASSWORD_COMMAND` and `RESTIC_FROM_PASSWORD_COMMAND` are passed +through to restic as-is — restic handles those natively. + +#### rclone credentials via env vars + +rclone config values can be supplied as env vars named +`RCLONE_CONFIG__`. Combined with `_COMMAND`, this lets +keld fetch rclone credentials from a secret manager too. Remote names +**must use underscores** (not hyphens) for this to work. Passwords must +be piped through `rclone obscure -`: + +```toml +["@hetzner".environ] +RCLONE_CONFIG_HETZNER_RESTIC_USER_COMMAND = "op read 'op://Vault/Hetzner/username'" +RCLONE_CONFIG_HETZNER_RESTIC_PASS_COMMAND = "op read 'op://Vault/Hetzner/password' | rclone obscure -" +``` + +### Key alias + +`repository` in config maps to restic's `--repo` flag automatically. + +### Wrapped commands + +Keld's interactive menu exposes: `backup`, `restore`, `snapshots`, `forget`, `check`, `init`. All other restic commands pass through as subcommands. + +## Environment variables + +| Variable | Purpose | +| ------------------- | --------------------------------------- | +| `KELD_CONFIG_FILE` | Single config file (replaces discovery) | +| `KELD_CONFIG_PATHS` | Colon-separated additional config paths | +| `KELD_DRYRUN` | Enable dry-run mode | +| `KELD_EXECUTABLE` | Override restic binary path | + +## Gotchas + +- Across multiple config files, later files win at the top-level table layer +- Nested tables like `[preset.backup]` and `[*.environ]` are **replaced** wholesale, not deep-merged across files +- `restic.Run()` uses `syscall.Exec` — it replaces the process. No Go code runs after, no defers execute +- String-form `_arguments` splits on whitespace — paths with spaces require array form +- Always verify new config before running live: + ```bash + keld --show-command --preset + # Or test a specific file before placing it in a discovery path: + keld --config ./keld.toml --show-command --preset + ``` + +## Automation with systemd + +For scheduled backups (with daily structural checks) and monthly full integrity verification with systemd user timers, see [config-examples.md](references/config-examples.md). diff --git a/skills/backing-up-with-keld/references/config-examples.md b/skills/backing-up-with-keld/references/config-examples.md new file mode 100644 index 0000000000000000000000000000000000000000..b9a3f2d33e0d20befb90b43845427fbda250a559 --- /dev/null +++ b/skills/backing-up-with-keld/references/config-examples.md @@ -0,0 +1,193 @@ +# Config examples + +```toml +# keld config merge order (lowest to highest precedence): +# [global] -> [global.] -> split preset parts -> full preset -> CLI overrides +# +# For a split preset like `home@cloud` and command `backup`, keld checks: +# [global] -> [global.backup] -> [@cloud] -> [@cloud.backup] -> +# [home@] -> [home@.backup] -> [home@cloud] -> [home@cloud.backup] + +[global] +# Shared flags for every command and preset. +password-file = "~/.config/restic/password.txt" + +[global.backup] +# Command-specific defaults. +exclude-file = "~/.config/restic/excludes.txt" +exclude-if-present = ".nobackup" + +[home.backup] +# Simple preset (no @): `keld --preset home backup` +_arguments = ["/home/user"] +tag = ["home"] + +# ── Suffixes: the "where" ─────────────────────────────────── +# +# Suffix sections define backend-wide settings. Restic supports paths +# inside rclone remotes and S3 buckets, so one remote or bucket can hold +# multiple independent repositories. Credentials live here once. + +["@nas"] +# Suffix: applies to any `*@nas` preset. +repository = "sftp:nas@example.org:/mnt/backups/restic" +tag = ["nas"] + +["@cloud"] +# Suffix for S3-compatible cloud backups (B2, Wasabi, etc.). +# Note: no repository here — each full preset sets its own path +# within the shared bucket. +tag = ["cloud"] + +# Keys ending in _COMMAND are executed by keld; stdout becomes the env +# var with the suffix removed. RESTIC_PASSWORD_COMMAND and +# RESTIC_FROM_PASSWORD_COMMAND are passed through to restic as-is. +["@cloud".environ] +AWS_ACCESS_KEY_ID_COMMAND = "op read 'op://Vault/Cloud/access-key-id'" +AWS_SECRET_ACCESS_KEY_COMMAND = "op read 'op://Vault/Cloud/secret-access-key'" + +# ── Prefixes: the "what" ──────────────────────────────────── +# +# Prefix sections define sources, excludes, and the restic encryption +# password. Each dataset gets its own password so repositories are +# independently encrypted, even when stored on the same backend. + +["home@"] +# Prefix: applies to any `home@*` preset. +host = "my-laptop" +tag = ["home"] + +["home@".environ] +RESTIC_PASSWORD_COMMAND = "pass show backups/restic-home" + +["home@".backup] +# Prefix + command section. +_arguments = ["/home/user", "/etc"] +exclude-if-present = ".keld-skip" + +["home@".forget] +# Prefix + different command. +keep-daily = 7 +keep-weekly = 5 +keep-monthly = 12 + +# ── Full presets: prefix + suffix with repo path ──────────── +# +# Full preset sections set the repository path within each backend. +# This is where the prefix ("what") meets the suffix ("where"). + +["home@nas"] +# Uses the @nas suffix's sftp base; path set here. +repository = "sftp:nas@example.org:/mnt/backups/restic/home" + +["home@cloud"] +# Path within the shared S3 bucket for this dataset. +repository = "s3:s3.us-east-1.amazonaws.com/my-restic-bucket/home" + +# ── BorgBase: per-repo URLs ───────────────────────────────── +# +# BorgBase provides per-repository URLs with embedded credentials. +# Each repo is specific to one dataset — there's no shared @borgbase +# suffix to reuse. Keep the split form so the prefix's _arguments +# and password settings still apply through the merge chain. + +["home@borgbase_home"] +repository = "rest:https://ab12cd34:s3cr3tp4ssw0rd@ab12cd34.repo.borgbase.com" +``` + +## Systemd timer setup + +Example systemd user units for daily backups (with lightweight structural +check) and monthly full integrity verification, with optional healthchecks.io +integration via `runitor`. Also assume use via `mise` and that the user has +set up the `mise` config snippet for `keld`. This might not be the case; +you'll need to ask or figure out how they've installed `keld`. + +### Unit files + +Installed to `~/.config/systemd/user/` or `/etc/systemd/user/`. + +**keld-backup@.service** — runs backup then a quick structural check. Both +are wrapped by a single runitor invocation so a failure in either reports +to the same healthcheck: + +```ini +[Unit] +Description=keld %I backup + +[Service] +Nice=19 +IOSchedulingClass=idle +KillSignal=SIGINT +EnvironmentFile=-%h/.config/keld/timers/%I_backup.env +ExecStart=/bin/sh -c '%h/.local/bin/mise x github:bdd/runitor -- runitor -- /bin/sh -c "mise x http:keld -- keld --preset %I backup && mise x http:keld -- keld --preset %I check"' +``` + +**keld-backup@.timer**: + +```ini +[Unit] +Description=Daily keld %I backup + +[Timer] +OnCalendar=daily +AccuracySec=1m +RandomizedDelaySec=1h +Persistent=true + +[Install] +WantedBy=timers.target +``` + +**keld-integrity@.service** — monthly full data integrity verification. +Downloads and verifies every pack file (`--read-data`): + +```ini +[Unit] +Description=keld %I integrity check (read-data) + +[Service] +Nice=19 +IOSchedulingClass=idle +KillSignal=SIGINT +EnvironmentFile=-%h/.config/keld/timers/%I_integrity.env +ExecStart=%h/.local/bin/mise x github:bdd/runitor -- runitor -- mise x http:keld -- keld --preset %I check --read-data +``` + +**keld-integrity@.timer**: + +```ini +[Unit] +Description=Monthly keld %I integrity check + +[Timer] +OnCalendar=monthly +AccuracySec=1m +RandomizedDelaySec=1h +Persistent=true + +[Install] +WantedBy=timers.target +``` + +### Enabling timers + +```bash +# For each preset (e.g. media@borgbase_media): +systemctl --user enable --now keld-backup@media@borgbase_media.timer +systemctl --user enable --now keld-integrity@media@borgbase_media.timer + +# Verify +systemctl --user list-timers +``` + +### Healthchecks env files + +Create env files in `~/.config/keld/timers/` with a `CHECK_UUID` variable for each preset: + +``` +~/.config/keld/timers/media@borgbase_media_backup.env → backup + check UUID +~/.config/keld/timers/media@borgbase_media_integrity.env → monthly read-data UUID +``` + +The `EnvironmentFile=-` prefix (`-`) makes the env file optional — the timer still works without healthchecks. diff --git a/skills/backing-up-with-keld/references/installation.md b/skills/backing-up-with-keld/references/installation.md new file mode 100644 index 0000000000000000000000000000000000000000..2c495effa6558e1a9cbee11953a2a0e8d8c82f32 --- /dev/null +++ b/skills/backing-up-with-keld/references/installation.md @@ -0,0 +1 @@ +TODO: none of the code lives anywhere yet, and neither does the binary diff --git a/skills/initialising-and-updating-agents-md/SKILL.md b/skills/initialising-and-updating-agents-md/SKILL.md new file mode 100644 index 0000000000000000000000000000000000000000..ea6532612becde0ea3ede8c3d66ccceed74f27da --- /dev/null +++ b/skills/initialising-and-updating-agents-md/SKILL.md @@ -0,0 +1,71 @@ +--- +name: initialising-and-updating-agents-md +description: Analyses a codebase and creates or updates AGENTS.md to help future agents work effectively. Use when asked to initialise, generate, or update an AGENTS.md file, or when the user says "init agents", "generate agents.md", or wants to document a project for agent use. +license: GPL-3.0-or-later +metadata: + author: Amolith +--- + +Analyse the codebase and produce an `AGENTS.md` that documents what an agent _cannot discover on its own_ by reading the code. + +Research from ETH Zurich shows that auto-generated AGENTS.md files _reduce_ task success by ~3% and inflate costs by 20%+ because they duplicate what agents already find by reading the repo. Human-written files help only when they contain non-discoverable information. The key filter for every line: **can an agent figure this out by reading the code? If yes, don't write it.** + +## Precondition + +Check whether the directory is empty or contains only config files (e.g. `.gitignore`, `.editorconfig`, lock files). If so, stop and tell the user: + +> Directory appears empty or only contains config. Add source code first, then run this command to generate AGENTS.md. + +## Discovery + +1. **Directory contents**: Run `ls` to get the lay of the land +2. **Existing rule files**: Check for and read any that exist: + - `.cursor/rules/*.md`, `.cursorrules` + - `.github/copilot-instructions.md` + - `claude.md`, `CLAUDE.md` + - `agents.md`, `AGENTS.md` +3. **Project type**: Identify from config files and directory structure (e.g. `package.json`, `Cargo.toml`, `go.mod`, `pyproject.toml`, `Makefile`) +4. **Commands**: Find build/test/lint/run commands from config files, scripts, Makefiles, CI configs — especially single-file and single-test variants +5. **Source patterns**: Read representative source files to identify non-obvious conventions +6. **Existing AGENTS.md**: If one exists, read it — you're improving, not starting from scratch + +## What belongs in AGENTS.md + +Agents can grep the entire codebase before you finish typing your prompt. They don't need a map — they need to know where the landmines are. + +**Include (non-discoverable):** + +- Commands that differ from convention — `uv` instead of `pip`, `bun` instead of `npm`, custom test runners, single-file lint/typecheck variants +- Gotchas that break silently — "run tests with `--no-cache` or fixtures give false positives" +- Non-obvious constraints — "the `legacy/` directory is deprecated but three production modules import from it; don't delete anything in it" +- Tooling the agent should prefer — custom scripts, linters, formatters that aren't obvious from config +- Project-specific guidance from existing rule files discovered during step 2 + +**Omit (discoverable):** + +- Codebase overviews and directory listings — the agent reads these itself +- Tech stack and framework descriptions — visible in config files +- Code style details — agents are in-context learners; they'll match existing patterns. Use linters and formatters instead. +- Architecture explanations — agents infer this from the code +- Anything the agent would find by reading the README, config files, or source + +## Progressive disclosure + +For monorepos or large projects, don't stuff everything into one root file. Place focused `AGENTS.md` files at the relevant directory level: + +``` +AGENTS.md # repo-wide: tooling, gotchas, commands +services/api/AGENTS.md # API-specific: custom middleware pattern, DB migration gotchas +packages/ui/AGENTS.md # UI-specific: design token locations, component patterns +``` + +Agents read the nearest file in the directory tree, so the closest one takes precedence. + +## Output principles + +- **Short.** Every line goes into every agent session. Aim for under 60 lines; under 30 is better. Each line must justify its token cost. +- **Landmines, not maps.** Document what will trip the agent up, not what it can see for itself. +- **Commands first.** Put executable commands early — agents reference them often. Prefer single-file variants (`npm run tsc --noEmit path/to/file.tsx`) over project-wide builds. +- **Living document.** Treat it like a bug tracker, not a wiki. When the agent trips on something non-obvious, add a line. When you fix the root cause, delete the line. +- **Only document what you observe.** Never invent commands, patterns, or conventions. +- **Preserve existing content** when updating. Merge new findings; don't discard prior work without reason. diff --git a/skills/monitoring-with-munin/SKILL.md b/skills/monitoring-with-munin/SKILL.md new file mode 100644 index 0000000000000000000000000000000000000000..c89e45afc6a4744661fbc1478d09b4f1c006a89e --- /dev/null +++ b/skills/monitoring-with-munin/SKILL.md @@ -0,0 +1,203 @@ +--- +name: monitoring-with-munin +description: Deploys and manages Munin monitoring across servers. Use when setting up munin-node on a host, writing munin plugins, adding nodes to a master, configuring alerts, or diagnosing system issues using munin data. Also use when the user mentions munin, monitoring, or graphing server metrics. +license: GPL-3.0-or-later +metadata: + author: Amolith +--- + +If the user has an existing Munin setup they want you to work with, ask them for specifics: where the master is, how nodes are connected (Tailscale, direct IP, SSH tunnel), and what OS the target hosts run. + +## Installing munin-node + +### Debian/Ubuntu + +```bash +apt-get install -y munin-node +munin-node-configure --shell | sh -x # auto-detect and symlink plugins +systemctl enable --now munin-node +``` + +### Arch Linux + +```bash +pacman -S --noconfirm munin-node +# Net::CIDR is often unavailable on Arch; use regex allow instead of cidr_allow +munin-node-configure --shell | sh -x +systemctl enable --now munin-node +``` + +## Configuring munin-node + +Config lives at `/etc/munin/munin-node.conf`. Key directives: + +```ini +host * # bind to all interfaces +port 4949 +allow ^127\.0\.0\.1$ # regex against connecting IP +allow ^::1$ +allow 100\.107\.78\.23 # master's IP (unanchored works too) +cidr_allow 100.107.78.23/32 # alternative (needs perl Net::CIDR) +``` + +The `allow` directive uses Perl regexes matched against the client IP. When the connection arrives as IPv6-mapped IPv4 (`::ffff:A.B.C.D`), the anchored regex `^A\.B\.C\.D$` won't match. Use an **unanchored** regex like `A\.B\.C\.D` to handle both forms, or add an explicit `allow ^::ffff:A\.B\.C\.D$`. + +On Arch Linux, `Net::CIDR` is typically unavailable (only `Net::CIDR::Lite` exists in pacman). If `cidr_allow` causes `Can't locate Net/CIDR.pm` errors, remove all `cidr_allow` lines and use `allow` regexes instead. + +After changing config: `systemctl restart munin-node` + +### Firewall + +If UFW is present, restrict port 4949 to the master only: + +```bash +ufw allow from to any port 4949 comment 'munin master' +ufw deny in 4949 comment 'deny munin from everyone else' +``` + +Order matters — allow must come before deny. + +## Adding a node to the master + +Append to `/etc/munin/munin.conf` on the master: + +```ini +[groupname;hostname] + address + use_node_name yes +``` + +Group names organize the web UI — use logical names like `nixnet`, `exe.xyz`, and `personal`. + +Seed data immediately: `su - munin --shell=/bin/bash -c '/usr/bin/munin-cron'` + +### Verifying connectivity + +From the master, test the node protocol: + +```bash +# Basic test (non-multigraph plugins only) +echo 'quit' | nc -w3 4949 + +# Full test including multigraph plugins +{ sleep 1; echo 'cap multigraph'; sleep 1; echo 'list'; sleep 1; echo 'quit'; } | nc -w5 4949 +``` + +A working node responds with `# munin node at ` followed by the plugin list. + +## Writing plugins + +A plugin is any executable in `/etc/munin/plugins/` (usually a symlink to `/usr/share/munin/plugins/` or `/usr/lib/munin/plugins/`). It must handle two invocations: + +```bash +./plugin config # print graph metadata +./plugin # print values +``` + +### Minimal shell plugin + +```sh +#!/bin/sh +if [ "${1:-}" = "config" ]; then + echo "graph_title My metric" + echo "graph_vlabel units" + echo "graph_category system" + echo "myfield.label Some value" + exit 0 +fi +echo "myfield.value $(cat /some/source)" +``` + +### Field names + +Must match `^[A-Za-z_][A-Za-z0-9_]*$`. Sanitize dynamic names: + +```sh +field=$(echo "$name" | sed 's/[^A-Za-z0-9_]/_/g; s/^[0-9]/_/') +``` + +### Data types + +- `GAUGE` (default): absolute value, plotted as-is +- `COUNTER`/`DERIVE`: ever-increasing counter; munin computes rate per second. Use `DERIVE` with `.min 0` to avoid spikes on counter reset. + +### Multigraph plugins + +Output multiple graphs from one plugin by emitting `multigraph ` lines before each graph's config/values. Multigraph plugins are hidden from `list` output unless the client sends `cap multigraph` first. + +### Plugin configuration + +Per-plugin settings go in `/etc/munin/plugin-conf.d/`: + +```ini +[plugin_name] + user root + env.configfile /path/to/config + env.statuses available away chat xa +``` + +### Testing + +```bash +munin-run config # test config output +munin-run # test value output +``` + +Note: on systems where munin-node runs with `ProtectHome=yes` (systemd), plugins running as non-root users cannot access `/home/`. Either run as `user root` or place data outside `/home/`. + +After installing or removing plugins: `systemctl restart munin-node` + +## Alerting + +Alerts are configured in `/etc/munin/munin.conf` on the master. A contact is a command that receives alert text on stdin. + +```ini +contact.ntfy.command /usr/local/bin/munin-ntfy-alert +contact.ntfy.always_send warning critical +contact.ntfy.text ${var:host} :: ${var:graph_title} :: ${loop<, >:wfields WARNING ${var:label}=${var:value}} ${loop<, >:cfields CRITICAL ${var:label}=${var:value}} +``` + +### Thresholds + +Override per host or globally. The `memory` plugin uses percentages: + +```ini +[groupname;hostname] + memory.warning 80 + memory.critical 90 +``` + +Plugin-specific fields use `pluginname.fieldname.warning` syntax. + +### Alert variables + +| Variable | Description | +| -------------------------- | ----------------------------------------------- | +| `${var:host}` | Node hostname | +| `${var:graph_title}` | Plugin's graph title | +| `${var:worst}` | Worst status: OK, WARNING, CRITICAL, UNKNOWN | +| `${var:worstid}` | Numeric: 0=OK, 1=WARNING, 2=CRITICAL, 3=UNKNOWN | +| `${loop:wfields ...}` | Iterate warning fields | +| `${loop:cfields ...}` | Iterate critical fields | +| `${var:label}` | Field label (inside loop) | +| `${var:value}` | Field value (inside loop) | + +## Querying data programmatically + +RRD files on the master are queryable: + +```bash +rrdtool fetch /var/lib/munin/group/host-plugin-field-g.rrd AVERAGE --start -1h +``` + +The munin-node protocol is also directly queryable over TCP: + +```bash +{ echo 'fetch memory'; sleep 1; echo 'quit'; } | nc 4949 +``` + +## Reference + +- **Plugin gallery**: https://gallery.munin-monitoring.org/ +- **Full docs**: https://guide.munin-monitoring.org/en/latest/ +- **Writing plugins**: See [writing-plugins.md](references/writing-plugins.md) diff --git a/skills/monitoring-with-munin/references/writing-plugins.md b/skills/monitoring-with-munin/references/writing-plugins.md new file mode 100644 index 0000000000000000000000000000000000000000..26e9f29cb3219226bf7c2ed2cceabed8dd25102f --- /dev/null +++ b/skills/monitoring-with-munin/references/writing-plugins.md @@ -0,0 +1,159 @@ +# Writing Munin Plugins + +## Protocol + +A plugin is called with no arguments to fetch values, and with `config` to describe the graph. Optional: `autoconf` (print `yes`/`no`), `suggest` (print modes for wildcard plugins). + +### Config output + +Global attributes describe the graph: + +| Attribute | Purpose | Example | +|---|---|---| +| `graph_title` | Title above graph | `graph_title CPU usage` | +| `graph_vlabel` | Y-axis label | `graph_vlabel percent` | +| `graph_category` | Grouping in web UI | `graph_category system` | +| `graph_args` | Passed to rrdgraph | `graph_args --base 1000 -l 0` | +| `graph_scale` | Enable SI scaling | `graph_scale no` | +| `graph_info` | Description below graph | `graph_info CPU usage by type` | + +Per-field attributes: + +| Attribute | Purpose | Example | +|---|---|---| +| `field.label` | Legend label (required) | `cpu.label CPU` | +| `field.type` | GAUGE, COUNTER, DERIVE | `cpu.type DERIVE` | +| `field.draw` | LINE1, LINE2, AREA, AREASTACK | `mem.draw AREASTACK` | +| `field.min` | Minimum valid value | `cpu.min 0` | +| `field.max` | Maximum valid value | `cpu.max 100` | +| `field.warning` | Warning threshold | `cpu.warning 80` | +| `field.critical` | Critical threshold | `cpu.critical 95` | +| `field.info` | Field description | `cpu.info Percent CPU used` | +| `field.cdef` | RPN expression transform | `bytes.cdef bytes,8,*` | +| `field.negative` | Mirror another field below axis | `up.negative down` | + +### Value output + +``` +fieldname.value 42 +otherfield.value 3.14 +``` + +Use `U` for unknown: `fieldname.value U` + +## Wildcard plugins + +A single plugin script handles multiple instances via its symlink name. The plugin parses `basename $0` to determine what to monitor. + +Example: `if_` plugin symlinked as `if_eth0`, `if_wlan0`. The script strips its prefix to get the interface name. + +Wildcard plugins should implement `suggest` to list valid instances: + +```sh +if [ "${1:-}" = "suggest" ]; then + ls /sys/class/net/ | grep -v '^lo$' + exit 0 +fi +``` + +## Multigraph plugins + +Emit `multigraph ` before each graph's output: + +```sh +#!/bin/sh +if [ "${1:-}" = "config" ]; then + echo "multigraph service_users" + echo "graph_title Users" + echo "graph_category myservice" + echo "graph_vlabel count" + echo "total.label Total users" + + echo "multigraph service_uptime" + echo "graph_title Uptime" + echo "graph_category myservice" + echo "graph_vlabel days" + echo "uptime.label Uptime" + exit 0 +fi + +echo "multigraph service_users" +echo "total.value $(get_user_count)" + +echo "multigraph service_uptime" +echo "uptime.value $(get_uptime_days)" +``` + +The master must negotiate `cap multigraph` before `list` will show these plugins. This happens automatically during normal polling. + +## Graph args reference + +Common `graph_args` values: + +- `--base 1000`: decimal SI units (1k = 1000) +- `--base 1024`: binary units (1Ki = 1024), use for bytes +- `-l 0`: lower limit 0 (graph won't go below) +- `--upper-limit 100`: upper limit (for percentages) + +## Magic markers + +Add these comments for `munin-node-configure` auto-detection: + +```sh +#%# family=auto # or contrib, manual +#%# capabilities=autoconf suggest +``` + +## Testing during development + +```bash +# Set MUNIN_LIBDIR if plugin uses plugin.sh helpers +export MUNIN_LIBDIR=/usr/share/munin # or /usr/lib/munin on Arch + +# Test directly +chmod +x ./myplugin +./myplugin config +./myplugin + +# Test through munin-run (sets up full environment) +cp myplugin /etc/munin/plugins/ +munin-run myplugin config +munin-run myplugin +``` + +## Common patterns + +### Monitoring a CLI tool's output + +```sh +#!/bin/sh +case ${1:-} in + config) + echo "graph_title My Service Stats" + echo "graph_category myservice" + echo "graph_vlabel count" + echo "connections.label Active connections" + exit 0 ;; +esac +echo "connections.value $(myservice-ctl status | awk '/connections:/ {print $2}')" +``` + +### Monitoring an API endpoint + +```sh +#!/bin/sh +case ${1:-} in + config) + echo "graph_title API Response Time" + echo "graph_category network" + echo "graph_vlabel ms" + echo "response.label Response time" + exit 0 ;; +esac +ms=$(curl -s -o /dev/null -w '%{time_total}' http://localhost:8080/health | awk '{printf "%.0f", $1 * 1000}') +echo "response.value $ms" +``` + +### Monitoring container resource usage (Incus/LXD) + +Query the Incus API via `incus query` for per-container stats. Run as `user root` in plugin-conf.d. Use DERIVE for CPU (cumulative nanoseconds → rate), AREASTACK for memory.