Potential schema.md

  1<!--
  2SPDX-FileCopyrightText: Amolith <amolith@secluded.site>
  3
  4SPDX-License-Identifier: CC0-1.0
  5-->
  6
  7# BadgerDB Schema Design for nasin pali
  8
  9## TL;DR
 10
 11- Use a single BadgerDB with a small, hierarchical keyspace: dir/…, idx/…, and s/{sid}/… (per-session).
 12- Store session documents (goal, tasks, meta) under s/{sid}/…, append-only events under s/{sid}/evt/{seq}, and small secondary indexes for fast lookups (active session by dir, tasks by status, active-session set).
 13- Rely on prefix scans and Badger subscriptions over these prefixes to drive the TUI in real time. Keep values JSON for readability; keep index values empty when possible; small payloads OK for convenience.
 14
 15## Recommended approach (simple path)
 16
 17### Keyspace layout (prefixes and values)
 18
 19#### Schema/version
 20- meta/schema_version -> "1"
 21
 22#### Working directory → active session lookup (with parent-walk)
 23- dir/{dir_hash}/active -> {sid}
 24  - Path canonicalization:
 25    - Start with current working directory
 26    - Convert to absolute path
 27    - Resolve all symlinks (EvalSymlinks)
 28    - Normalize path separators to forward slashes
 29    - On Windows, always case-fold to lowercase; on other OSes, do not case-fold
 30    - Result: canonical_path
 31  - {dir_hash} = full blake3-256(canonical_path), hex-encoded lowercase (64 hex chars)
 32  - This prevents issues with symlinks, bind mounts, and cross-platform path differences
 33  - Value is the current active {sid} (string)
 34  - Parent-walk resolution: from canonical_path, compute {dir_hash} and check dir/{dir_hash}/active; if absent, ascend to the parent directory and repeat until root; first match wins
 35  - Symlinks are resolved during canonicalization via EvalSymlinks; cycles are prevented by EvalSymlinks itself and an additional visited set for safety during parent-walk
 36
 37#### Active sessions set (for fast listing + subscribe)
 38- idx/active/{sid} -> {dir_hash} (or tiny JSON summary later if desired)
 39
 40#### Archived sessions (queryable by time and by dir)
 41- idx/archived/{ts_be}/{sid} -> {dir_hash}          // for global archive lists in chronological order
 42- dir/{dir_hash}/archived/{ts_be}/{sid} -> ""       // for "archive history" per working directory
 43  - {ts_be} = 8-byte big-endian Unix nanos, hex-encoded lowercase, zero-padded to 16 hex chars; sorts chronologically
 44
 45#### Per-session namespace
 46- s/{sid}/meta -> JSON
 47  - { sid, dir_path, dir_hash, state: "active"|"archived", created_at, archived_at: null|ts, last_updated_at }
 48- s/{sid}/goal -> JSON
 49  - { title, description, updated_at }
 50- s/{sid}/task/{task_id} -> JSON
 51  - { id, title, description, status: "pending"|"in_progress"|"completed"|"failed"|"cancelled", created_at, updated_at, created_seq }
 52  - {task_id} = first 6 hex chars of blake3(normalize(title)+'|'+normalize(description)+'|'+sid); scoped per session, so cross-session collisions are irrelevant
 53- s/{sid}/idx/status/{status}/{task_id} -> "" (empty value; presence = membership)
 54  - Maintained atomically on task create/status update/delete
 55- s/{sid}/meta/evt_seq -> 8-byte big-endian counter stored as raw bytes (monotonic per session)
 56- s/{sid}/evt/{seq_be} -> JSON event record
 57  - {seq_be} = 8-byte big-endian u64 counter, hex-encoded lowercase, zero-padded to 16 hex chars; assures correct chronological iteration
 58  - { seq, at, type, reason: null|string, cmd: "np ...", payload: {...} }
 59  - Event payloads capture immutable snapshots so subscribers can render changes without extra lookups:
 60    - type=goal_set → `{"goal":{"title":"…","description":"…","updated_at":"RFC3339"}}`
 61    - type=goal_updated → `{"goal_before":{"title":"…","description":"…","updated_at":"RFC3339"},"goal_after":{"title":"…","description":"…","updated_at":"RFC3339"}}`
 62    - type=task_added → `{"task":{"id":"a1b2c3","title":"…","description":"…","status":"pending","created_at":"RFC3339","updated_at":"RFC3339","created_seq":1}}`
 63    - type=task_updated → `{"task_id":"a1b2c3","before":{"title":"…","description":"…","updated_at":"RFC3339"},"after":{"title":"…","description":"…","updated_at":"RFC3339"}}`
 64    - type=task_status_changed → `{"task_id":"a1b2c3","title":"…","status_before":"pending","status_after":"in_progress","updated_at":"RFC3339"}`
 65  - Supported event types: goal_set, goal_updated, task_added, task_updated, task_status_changed
 66
 67### Core operations (transactional)
 68
 69#### Start session (np s)
 701) Canonicalize working directory using the process described in the "Working directory → active session lookup" section.
 712) Compute dir_hash from canonical_path.
 723) Check if dir/{dir_hash}/active exists:
 73   - If it exists, read the active {sid} and print: "Session {sid} is already active for this directory. There's already an active session for this directory; ask your operator whether they want to resume or archive it."
 74   - Return 0 (idempotent operation).
 754) If no active session exists, begin txn:
 76   - Generate new sid = ULID (canonical 26-character string for uniqueness and natural sort order).
 77   - Put s/{sid}/meta (state=active), s/{sid}/meta/evt_seq=0 (do NOT create goal yet; goal is created when first set).
 78   - Put dir/{dir_hash}/active -> {sid}.
 79   - Put idx/active/{sid} -> {dir_hash}.
 805) Commit.
 81
 82#### Set goal (np g s …)
 831) Lookup sid via dir/{dir_hash}/active.
 842) Check if s/{sid}/goal exists:
 85   - If it exists: error with message "Goal already set. Use 'np g u' to update it (requires -r/--reason flag)."
 86   - If it does not exist: proceed to step 3.
 873) txn:
 88   - Create s/{sid}/goal JSON.
 89   - Update s/{sid}/meta.last_updated_at.
 90   - Increment s/{sid}/meta/evt_seq (read current value, add 1, persist) before writing the corresponding event, then write s/{sid}/evt/{seq} with type=goal_set (no reason required).
 91
 92#### Update goal (np g u …)
 931) Lookup sid via dir/{dir_hash}/active.
 942) Check if s/{sid}/goal exists:
 95   - If it does not exist: error with message "No goal set yet. Use 'np g s' to set it first."
 96   - If it exists: proceed to step 3.
 973) Require reason flag (-r or --reason) with brief explanation for the update.
 984) txn:
 99   - Update s/{sid}/goal JSON.
100   - Update s/{sid}/meta.last_updated_at.
101   - Increment s/{sid}/meta/evt_seq (read, add 1, persist) before writing the event, then write s/{sid}/evt/{seq} with type=goal_updated, reason in payload.
102
103#### Add tasks (np t a …)
1041) For each task:
105   - Compute deterministic id = first 6 hex chars of blake3(normalize(title)+"|"+normalize(description)+"|"+sid)
106     - normalize(x) = strings.TrimSpace(x), then case-fold to lowercase, then apply Unicode NFC normalization
107     - Deterministic across runs and processes; stable within session; won't change on edits because id derives from initial content + sid.
108     - Treat adds as idempotent: if the same task (same title+description) is re-added, it will resolve to the same id and be a no-op.
109     - If a user wants to retry a cancelled task with the exact same title and description, they should update the existing task's status rather than adding a new one, or modify the title/description slightly to differentiate it.
110   - txn:
111     - If s/{sid}/task/{id} absent, read and increment evt_seq (persisting the new value) and create the task with status=pending, created_at=now, created_seq=evt_seq.
112     - Put s/{sid}/idx/status/pending/{id}.
113     - Update meta.last_updated_at.
114     - Append event task_added with task payload.
115   - Note: When adding multiple tasks in a single np t a invocation, use a single transaction (one task, or batch of sequenced tasks, per transaction) and increment evt_seq for each task to preserve the order they were provided on the command line. Tasks should be displayed sorted by created_seq (not created_at) to maintain stable, predictable ordering.
116
117#### Update task status/title/description (np t u …)
1181) Determine if reason is required:
119   - Required (-r or --reason flag) if updating title or description
120   - Required (-r or --reason flag) if changing status to "cancelled" or "failed"
121   - Not required for status changes to "pending", "in_progress", or "completed"
1222) txn:
123   - Read s/{sid}/task/{id}.
124   - If status changes: delete old s/{sid}/idx/status/{old}/{id}, put new s/{sid}/idx/status/{new}/{id}.
125   - Update task JSON.
126   - Update meta.last_updated_at.
127   - Increment evt_seq (read, add 1, persist) and append event task_updated or task_status_changed with reason (if provided).
128
129#### Archive session (np a)
1301) Read s/{sid}/meta:
131   - If already archived (state="archived"), return 0 (idempotent operation).
1322) If active, txn:
133   - Delete dir/{dir_hash}/active.
134   - Delete idx/active/{sid}.
135   - Update s/{sid}/meta.state="archived", archived_at=now, last_updated_at.
136   - Put idx/archived/{ts_be}/{sid} -> {dir_hash}.
137   - Put dir/{dir_hash}/archived/{ts_be}/{sid} -> "".
138
139### Query patterns
140
141- **Active session for current dir:**
142  - Get dir/{dir_hash}/active -> sid; if absent, parent-walk by recomputing dir_hash for parents until found or root
143- **List sessions (for TUI picker, not real-time):**
144  - One-time fetch: iterate idx/active/ for active sessions and idx/archived/ (descending) for archived sessions, limited to {terminal_height} total entries
145  - For each sid, read s/{sid}/meta for dir_path/timestamps and s/{sid}/goal for title
146  - Support paging through additional results if needed
147- **List tasks by status quickly:**
148  - Iterate s/{sid}/idx/status/{status}/, collect ids
149  - For each id, Get s/{sid}/task/{id}
150- **List all tasks:**
151  - Iterate s/{sid}/task/ prefix; sort by created_seq ascending to preserve insertion/provided order
152- **Events chronologically for a session:**
153  - Iterate s/{sid}/evt/ prefix ascending; {seq_be} sorts by event time order
154- **Resume after interruption (np r):**
155  - Use dir/{dir_hash}/active -> sid
156  - Read s/{sid}/goal and s/{sid}/task/* to render plan
157  - Optionally stream recent events from s/{sid}/evt/
158
159### Real-time monitoring via Badger subscribe
160
161#### Session view TUI (viewing a single session) subscribes to these prefixes:
162- s/{sid}/goal
163- s/{sid}/task/
164- s/{sid}/evt/
165- s/{sid}/meta (for last_updated_at/state)
166
167On receiving a change event, re-fetch the affected document(s) and redraw. Keep all write ops in a single txn so subscribers see atomic changes.
168
169#### Session list (NOT real-time)
170- Listing sessions is a one-time fetch operation, not real-time.
171- Fetch the most recent {terminal_height} sessions (active and archived combined) and allow paging.
172- Query pattern: iterate idx/active/ for active sessions, iterate idx/archived/ (descending by timestamp) for archived sessions.
173- For each session, read s/{sid}/meta and s/{sid}/goal to display directory, goal title, and timestamps.
174- No subscriptions needed for the list view—refresh only when user explicitly opens/refreshes the picker.
175
176### Value shapes (JSON)
177
178#### s/{sid}/meta:
179```json
180{
181  "sid": "01JP…",
182  "dir_path": "/abs/path",
183  "dir_hash": "…",
184  "state": "active",
185  "created_at": "RFC3339",
186  "archived_at": null,
187  "last_updated_at": "RFC3339"
188}
189```
190
191#### s/{sid}/goal:
192```json
193{
194  "title": "…",
195  "description": "…",
196  "updated_at": "RFC3339"
197}
198```
199
200#### s/{sid}/task/{id}:
201```json
202{
203  "id": "a1b2c3",
204  "title": "…",
205  "description": "…",
206  "status": "pending",
207  "created_at": "RFC3339",
208  "updated_at": "RFC3339",
209  "created_seq": 1
210}
211```
212
213#### s/{sid}/evt/{seq_be}:
214```json
215{
216  "seq": 1,
217  "at": "RFC3339Nano",
218  "type": "task_added",
219  "reason": null,
220  "cmd": "np t a …",
221  "payload": {}
222}
223```
224
225Note on `reason` field:
226- The `reason` field is optional (null or string) in all events
227- Reason is REQUIRED (via -r/--reason flag) only when:
228  - Updating existing goal or task content (title/description) - explain the change (e.g., "clarified scope", "fixed typo")
229  - Changing task status to "cancelled" or "failed" - explain why (e.g., "no longer needed", "blocked by missing dependency")
230- Reason is NOT required (and should be omitted/null) for expected operations:
231  - Setting goal initially
232  - Adding tasks
233  - Changing task status to "pending", "in_progress", or "completed"
234
235### Concurrency and safety
236
237- Always perform multi-key updates in a single Update txn:
238  - Example: task status change must atomically update task doc, move index keys, and append an event.
239  - When adding multiple tasks in a single CLI invocation, use one transaction that appends all events in order by incrementing evt_seq within the transaction.
240- Event sequencing:
241  - Keep s/{sid}/meta/evt_seq as a u64 counter (stored as 8-byte big-endian bytes, not decimal string, to avoid parse overhead). In txn:
242    - Read current value, increment for each event in the transaction, write back the final value, and write each event at s/{sid}/evt/{seq_be}.
243    - If txn conflict occurs (e.g., concurrent processes updating the same session), retry with short exponential backoff.
244    - This guarantees no gaps or overwrites in the event sequence; sequence numbers strictly increase.
245- Concurrent writers across multiple processes:
246  - Use Badger's optimistic concurrency control with retry logic on transaction conflicts.
247  - Keep transactions small and focused to minimize conflict probability.
248  - For batch operations (multiple tasks added by one command), use a single transaction to ensure atomicity and preserve ordering.
249- Enforce one active session per dir:
250  - np s checks if dir/{dir_hash}/active exists before creating a new session (idempotent behavior documented above).
251- Deterministic task IDs:
252- Use blake3(normalized(title)+"|"+normalized(description)+"|"+sid), take first 6 hex.
253  - Never recompute on updates; the id is "creation id," not a content hash of current state.
254  - Re-adding the same task (same title+description) resolves to the same id and is treated as a no-op.
255
256### Denormalization (minimal, targeted)
257
258- Status index per session (s/{sid}/idx/status/{status}/{task_id}) to filter efficiently by status.
259- Active sessions index (idx/active/{sid}) for quick listing and subscribe.
260- Archives indices for chronological browsing and per-dir history.
261- Everything else is normalized to keep writes simple and atomic.
262- Task ordering: tasks are displayed sorted by created_seq (not created_at timestamp) to preserve the exact order they were provided via command-line flags, avoiding issues with clock precision and race conditions.
263
264### Key naming conventions
265
266- Lowercase ASCII, "/" as namespace delimiter, fixed segment order.
267- Use big-endian (zero-padded hex) counters/timestamps in keys to preserve sort order.
268- Keep values small; JSON for human-inspectable records; empty values for set-like indexes.
269
270### How subscriptions map to UI updates
271
272- **Session pane (single session view):** Subscribe to s/{sid}/task/ and s/{sid}/goal; when a mutation arrives, read the document(s) and refresh.
273- **Event log pane (single session view):** Subscribe to s/{sid}/evt/; stream latest events as they're appended by iterating from the last seen {seq_be}.
274- **Session list picker:** No subscriptions. One-time fetch on open/refresh: scan idx/active/ and idx/archived/ (limited by terminal height), read s/{sid}/meta and s/{sid}/goal for each session to display directory, title, and timestamps. Support paging through results.
275
276### Effort/scope
277
278- Schema + helpers (hashing, key builders, JSON structs): S (≤1–3h).
279- Implement start, goal set/update, task add/update, archive with atomic txns + events: M (1–2d to get right and tested).
280- TUI subscriptions wired to these prefixes: M (1–2d) once data model is in place.
281
282## Rationale and trade-offs
283
284- Prefix-first design matches Badger's strengths: fast iterators, prefix scans, and subscriptions.
285- Using per-session monotonic seq keys for events is simpler and more reliable than time-sorted keys, and you can still store timestamps in the value.
286- Minimal denormalization: we only add what's necessary for speed (status index, active session index). Everything else is derivable.
287- Hashing dir paths keeps keys safe/short and avoids platform-specific path issues while keeping full path in values for display.
288- JSON values keep early development simple and debuggable; you can binary-encode later if needed, without changing keys.
289- GOOS/GOARCH were removed from dir_hash computation to support cross-platform shared home directories (e.g., NFS, Dropbox, synced directories). The same working directory should map to the same session regardless of which OS accesses it.
290
291## Risks and guardrails
292
293- **Concurrent writers:** Use retries on txn conflicts (especially when incrementing evt_seq). Keep transactions small and focused.
294- **Clock weirdness:** Event order is based on seq, not time, so your chronology is stable even if system clock changes; human-readable timestamps live in the value.
295- **Task ID determinism vs edits:** Because id is based on initial content + sid, it won't change during edits. If you attempt to re-add the same task, you'll get the same id; treat it as idempotent add.
296- **Orphaned indexes:** Always mutate doc + index keys in the same txn; on startup or rarely, you can offer a "repair" command to reconcile status indexes, but careful txns should make this unnecessary.
297- **Path hashing collisions:** With blake3 and 8+ bytes hex, collision risk is negligible; still store full dir_path in meta for verification.
298- **Subscription storms:** Subscribe narrowly (per-session) and debounce UI redraws.
299
300## When to consider the advanced path
301
302- **Thousands of tasks/events per session and UI becomes slow scanning JSON:**
303  - Switch task/event values to a compact binary encoding (e.g., flatbuffers/msgpack) to reduce I/O.
304- **Cross-session global event feed (optional):**
305  - Maintain a global index per event: idx/events/{ts_be}/{sid}/{seq_be} -> "" pointing to s/{sid}/evt/{seq_be}.
306  - Pros: simple "recent activity" TUI across sessions, single subscription to idx/events/, efficient time-range scans.
307  - Cons: extra write per event (write amplification), hot global prefix under high throughput, time-ordering relies on event timestamps (minor clock skew acceptable), large histories require pagination/retention.
308- **Need ordering of tasks:**
309  - Add s/{sid}/idx/task_order/{seq_be} -> {task_id} and maintain it on inserts/moves.
310- **Multi-tenant or remote DB:**
311  - Add tenant prefixes or shard DBs; preserve the same key layout.
312
313## Optional advanced path (brief)
314
315- Use ULID for sids to naturally order sessions by creation time (helps chronological archive listings without extra metadata).
316- Maintain s/{sid}/stats -> {pending, in_progress, …, updated_at} and update counts transactionally on task status changes; lets the session list show counts without reading all tasks.
317- Add CAS-like guards using a version field inside task JSON (optimistic concurrency within your app domain), rejecting stale updates gracefully.
318
319## Summary
320
321This schema gives you:
322- O(1) active-session lookup by working directory.
323- O(#tasks with status) retrieval for status-filtered views via a single prefix iteration.
324- Stable, strictly-ordered per-session event logs.
325- Simple, efficient prefixes to subscribe for real-time TUI updates.
326- Clean archival with preserved queryability and chronological browsing.