diff --git a/docs/Potential schema.md b/docs/Potential schema.md new file mode 100644 index 0000000000000000000000000000000000000000..b33a44082f1dfbcaed647c93fd099040e6e48023 --- /dev/null +++ b/docs/Potential schema.md @@ -0,0 +1,322 @@ + + +# BadgerDB Schema Design for nasin pali + +## TL;DR + +- Use a single BadgerDB with a small, hierarchical keyspace: dir/…, idx/…, and s/{sid}/… (per-session). +- Store session documents (goal, tasks, meta) under s/{sid}/…, append-only events under s/{sid}/evt/{seq}, and small secondary indexes for fast lookups (active session by dir, tasks by status, active-session set). +- Rely on prefix scans and Badger subscriptions over these prefixes to drive the TUI in real time. Keep values JSON for readability; keep index values empty when possible; small payloads OK for convenience. + +## Recommended approach (simple path) + +### Keyspace layout (prefixes and values) + +#### Schema/version +- meta/schema_version -> "1" + +#### Working directory → active session lookup (with parent-walk) +- dir/{dir_hash}/active -> {sid} + - Path canonicalization: + - Start with current working directory + - Convert to absolute path + - Resolve all symlinks (EvalSymlinks) + - Normalize path separators to forward slashes + - On Windows, always case-fold to lowercase; on other OSes, do not case-fold + - Result: canonical_path + - {dir_hash} = full blake3-256(canonical_path), hex-encoded lowercase (64 hex chars) + - This prevents issues with symlinks, bind mounts, and cross-platform path differences + - Value is the current active {sid} (string) + - Parent-walk resolution: from canonical_path, compute {dir_hash} and check dir/{dir_hash}/active; if absent, ascend to the parent directory and repeat until root; first match wins + - Symlinks are resolved during canonicalization via EvalSymlinks; cycles are prevented by EvalSymlinks itself and an additional visited set for safety during parent-walk + +#### Active sessions set (for fast listing + subscribe) +- idx/active/{sid} -> {dir_hash} (or tiny JSON summary later if desired) + +#### Archived sessions (queryable by time and by dir) +- idx/archived/{ts_be}/{sid} -> {dir_hash} // for global archive lists in chronological order +- dir/{dir_hash}/archived/{ts_be} -> {sid} // for "archive history" per working directory + - {ts_be} = 8-byte big-endian Unix nanos, hex-encoded lowercase, zero-padded to 16 hex chars; sorts chronologically + +#### Per-session namespace +- s/{sid}/meta -> JSON + - { sid, dir_path, dir_hash, state: "active"|"archived", created_at, archived_at: null|ts, last_updated_at } +- s/{sid}/goal -> JSON + - { title, description, updated_at } +- s/{sid}/task/{task_id} -> JSON + - { id, title, description, status: "pending"|"in_progress"|"completed"|"failed"|"cancelled", created_at, updated_at, created_seq } + - {task_id} = first 6 hex chars of blake3(normalize(title)+'|'+normalize(description)+'|'+sid); scoped per session, so cross-session collisions are irrelevant +- s/{sid}/idx/status/{status}/{task_id} -> "" (empty value; presence = membership) + - Maintained atomically on task create/status update/delete +- s/{sid}/meta/evt_seq -> 8-byte big-endian counter stored as raw bytes (monotonic per session) +- s/{sid}/evt/{seq_be} -> JSON event record + - {seq_be} = 8-byte big-endian u64 counter, hex-encoded lowercase, zero-padded to 16 hex chars; assures correct chronological iteration + - { seq, at, type, reason: null|string, cmd: "np ...", payload: {...} } + - Types you'll likely use: session_started, goal_set, goal_updated, task_added, task_updated, task_status_changed, session_archived, note + +### Core operations (transactional) + +#### Start session (np s) +1) Canonicalize working directory using the process described in the "Working directory → active session lookup" section. +2) Compute dir_hash from canonical_path. +3) Check if dir/{dir_hash}/active exists: + - If it exists, read the active {sid} and print: "Session {sid} is already active for this directory. There's already an active session for this directory; ask your operator whether they want to resume or archive it." + - Return 0 (idempotent operation). +4) If no active session exists, begin txn: + - Generate new sid = ULID (time-ordered) or blake3(rand) hex; keep as short as practical (e.g., 26 char ULID). + - Put s/{sid}/meta (state=active), s/{sid}/meta/evt_seq=0 (do NOT create goal yet; goal is created when first set). + - Put dir/{dir_hash}/active -> {sid}. + - Put idx/active/{sid} -> {dir_hash}. + - Append event s/{sid}/evt/{0000000000000001} type=session_started. +5) Commit. + +#### Set goal (np g s …) +1) Lookup sid via dir/{dir_hash}/active. +2) Check if s/{sid}/goal exists: + - If it exists: error with message "Goal already set. Use 'np g u' to update it (requires -r/--reason flag)." + - If it does not exist: proceed to step 3. +3) txn: + - Create s/{sid}/goal JSON. + - Update s/{sid}/meta.last_updated_at. + - Increment s/{sid}/meta/evt_seq and write s/{sid}/evt/{seq} with type=goal_set (no reason required). + +#### Update goal (np g u …) +1) Lookup sid via dir/{dir_hash}/active. +2) Check if s/{sid}/goal exists: + - If it does not exist: error with message "No goal set yet. Use 'np g s' to set it first." + - If it exists: proceed to step 3. +3) Require reason flag (-r or --reason) with brief explanation for the update. +4) txn: + - Update s/{sid}/goal JSON. + - Update s/{sid}/meta.last_updated_at. + - Increment s/{sid}/meta/evt_seq and write s/{sid}/evt/{seq} with type=goal_updated, reason in payload. + +#### Add tasks (np t a …) +1) For each task: + - Compute deterministic id = first 6 hex chars of blake3(normalize(title)+"|"+normalize(description)+"|"+sid) + - normalize(x) = strings.TrimSpace(x), then case-fold to lowercase, then apply Unicode NFC normalization + - Deterministic across runs and processes; stable within session; won't change on edits because id derives from initial content + sid. + - Treat adds as idempotent: if the same task (same title+description) is re-added, it will resolve to the same id and be a no-op. + - If a user wants to retry a cancelled task with the exact same title and description, they should update the existing task's status rather than adding a new one, or modify the title/description slightly to differentiate it. + - txn: + - If s/{sid}/task/{id} absent, increment evt_seq and create task with status=pending, created_at=now, created_seq=evt_seq. + - Put s/{sid}/idx/status/pending/{id}. + - Update meta.last_updated_at. + - Append event task_added with task payload. + - Note: When adding multiple tasks in a single np t a invocation, use a single transaction and increment evt_seq for each task to preserve the order they were provided on the command line. Tasks should be displayed sorted by created_seq (not created_at) to maintain stable, predictable ordering. + +#### Update task status/title/description (np t u …) +1) Determine if reason is required: + - Required (-r or --reason flag) if updating title or description + - Required (-r or --reason flag) if changing status to "cancelled" or "failed" + - Not required for status changes to "pending", "in_progress", or "completed" +2) txn: + - Read s/{sid}/task/{id}. + - If status changes: delete old s/{sid}/idx/status/{old}/{id}, put new s/{sid}/idx/status/{new}/{id}. + - Update task JSON. + - Update meta.last_updated_at. + - Increment evt_seq, append event task_updated or task_status_changed with reason (if provided). + +#### Archive session (np a) +1) Read s/{sid}/meta: + - If already archived (state="archived"), return 0 (idempotent operation). +2) If active, txn: + - Delete dir/{dir_hash}/active. + - Delete idx/active/{sid}. + - Update s/{sid}/meta.state="archived", archived_at=now, last_updated_at. + - Put idx/archived/{ts_be}/{sid} -> {dir_hash}. + - Put dir/{dir_hash}/archived/{ts_be} -> {sid}. + - Increment evt_seq, append session_archived event. + +### Query patterns + +- **Active session for current dir:** + - Get dir/{dir_hash}/active -> sid; if absent, parent-walk by recomputing dir_hash for parents until found or root +- **List sessions (for TUI picker, not real-time):** + - One-time fetch: iterate idx/active/ for active sessions and idx/archived/ (descending) for archived sessions, limited to {terminal_height} total entries + - For each sid, read s/{sid}/meta for dir_path/timestamps and s/{sid}/goal for title + - Support paging through additional results if needed +- **List tasks by status quickly:** + - Iterate s/{sid}/idx/status/{status}/, collect ids + - For each id, Get s/{sid}/task/{id} +- **List all tasks:** + - Iterate s/{sid}/task/ prefix; sort by created_seq ascending to preserve insertion/provided order +- **Events chronologically for a session:** + - Iterate s/{sid}/evt/ prefix ascending; {seq_be} sorts by event time order +- **Resume after interruption (np r):** + - Use dir/{dir_hash}/active -> sid + - Read s/{sid}/goal and s/{sid}/task/* to render plan + - Optionally stream recent events from s/{sid}/evt/ + +### Real-time monitoring via Badger subscribe + +#### Session view TUI (viewing a single session) subscribes to these prefixes: +- s/{sid}/goal +- s/{sid}/task/ +- s/{sid}/evt/ +- s/{sid}/meta (for last_updated_at/state) + +On receiving a change event, re-fetch the affected document(s) and redraw. Keep all write ops in a single txn so subscribers see atomic changes. + +#### Session list (NOT real-time) +- Listing sessions is a one-time fetch operation, not real-time. +- Fetch the most recent {terminal_height} sessions (active and archived combined) and allow paging. +- Query pattern: iterate idx/active/ for active sessions, iterate idx/archived/ (descending by timestamp) for archived sessions. +- For each session, read s/{sid}/meta and s/{sid}/goal to display directory, goal title, and timestamps. +- No subscriptions needed for the list view—refresh only when user explicitly opens/refreshes the picker. + +### Value shapes (JSON) + +#### s/{sid}/meta: +```json +{ + "sid": "01JP…", + "dir_path": "/abs/path", + "dir_hash": "…", + "state": "active", + "created_at": "RFC3339", + "archived_at": null, + "last_updated_at": "RFC3339" +} +``` + +#### s/{sid}/goal: +```json +{ + "title": "…", + "description": "…", + "updated_at": "RFC3339" +} +``` + +#### s/{sid}/task/{id}: +```json +{ + "id": "a1b2c3", + "title": "…", + "description": "…", + "status": "pending", + "created_at": "RFC3339", + "updated_at": "RFC3339", + "created_seq": 1 +} +``` + +#### s/{sid}/evt/{seq_be}: +```json +{ + "seq": 1, + "at": "RFC3339Nano", + "type": "task_added", + "reason": null, + "cmd": "np t a …", + "payload": {} +} +``` + +Note on `reason` field: +- The `reason` field is optional (null or string) in all events +- Reason is REQUIRED (via -r/--reason flag) only when: + - Updating existing goal or task content (title/description) - explain the change (e.g., "clarified scope", "fixed typo") + - Changing task status to "cancelled" or "failed" - explain why (e.g., "no longer needed", "blocked by missing dependency") +- Reason is NOT required (and should be omitted/null) for expected operations: + - Setting goal initially + - Adding tasks + - Changing task status to "pending", "in_progress", or "completed" + +### Concurrency and safety + +- Always perform multi-key updates in a single Update txn: + - Example: task status change must atomically update task doc, move index keys, and append an event. + - When adding multiple tasks in a single CLI invocation, use one transaction that appends all events in order by incrementing evt_seq within the transaction. +- Event sequencing: + - Keep s/{sid}/meta/evt_seq as a u64 counter (stored as 8-byte big-endian bytes, not decimal string, to avoid parse overhead). In txn: + - Read current value, increment for each event in the transaction, write back the final value, and write each event at s/{sid}/evt/{seq_be}. + - If txn conflict occurs (e.g., concurrent processes updating the same session), retry with short exponential backoff. + - This guarantees no gaps or overwrites in the event sequence; sequence numbers strictly increase. +- Concurrent writers across multiple processes: + - Use Badger's optimistic concurrency control with retry logic on transaction conflicts. + - Keep transactions small and focused to minimize conflict probability. + - For batch operations (multiple tasks added by one command), use a single transaction to ensure atomicity and preserve ordering. +- Enforce one active session per dir: + - np s checks if dir/{dir_hash}/active exists before creating a new session (idempotent behavior documented above). +- Deterministic task IDs: + - Use blake3(normalized(title)+"|"+normalized(description)+"|"+sid), take first 8 hex. + - Never recompute on updates; the id is "creation id," not a content hash of current state. + - Re-adding the same task (same title+description) resolves to the same id and is treated as a no-op. + +### Denormalization (minimal, targeted) + +- Status index per session (s/{sid}/idx/status/{status}/{task_id}) to filter efficiently by status. +- Active sessions index (idx/active/{sid}) for quick listing and subscribe. +- Archives indices for chronological browsing and per-dir history. +- Everything else is normalized to keep writes simple and atomic. +- Task ordering: tasks are displayed sorted by created_seq (not created_at timestamp) to preserve the exact order they were provided via command-line flags, avoiding issues with clock precision and race conditions. + +### Key naming conventions + +- Lowercase ASCII, "/" as namespace delimiter, fixed segment order. +- Use big-endian (zero-padded hex) counters/timestamps in keys to preserve sort order. +- Keep values small; JSON for human-inspectable records; empty values for set-like indexes. + +### How subscriptions map to UI updates + +- **Session pane (single session view):** Subscribe to s/{sid}/task/ and s/{sid}/goal; when a mutation arrives, read the document(s) and refresh. +- **Event log pane (single session view):** Subscribe to s/{sid}/evt/; stream latest events as they're appended by iterating from the last seen {seq_be}. +- **Session list picker:** No subscriptions. One-time fetch on open/refresh: scan idx/active/ and idx/archived/ (limited by terminal height), read s/{sid}/meta and s/{sid}/goal for each session to display directory, title, and timestamps. Support paging through results. + +### Effort/scope + +- Schema + helpers (hashing, key builders, JSON structs): S (≤1–3h). +- Implement start, goal set/update, task add/update, archive with atomic txns + events: M (1–2d to get right and tested). +- TUI subscriptions wired to these prefixes: M (1–2d) once data model is in place. + +## Rationale and trade-offs + +- Prefix-first design matches Badger's strengths: fast iterators, prefix scans, and subscriptions. +- Using per-session monotonic seq keys for events is simpler and more reliable than time-sorted keys, and you can still store timestamps in the value. +- Minimal denormalization: we only add what's necessary for speed (status index, active session index). Everything else is derivable. +- Hashing dir paths keeps keys safe/short and avoids platform-specific path issues while keeping full path in values for display. +- JSON values keep early development simple and debuggable; you can binary-encode later if needed, without changing keys. +- GOOS/GOARCH were removed from dir_hash computation to support cross-platform shared home directories (e.g., NFS, Dropbox, synced directories). The same working directory should map to the same session regardless of which OS accesses it. + +## Risks and guardrails + +- **Concurrent writers:** Use retries on txn conflicts (especially when incrementing evt_seq). Keep transactions small and focused. +- **Clock weirdness:** Event order is based on seq, not time, so your chronology is stable even if system clock changes; human-readable timestamps live in the value. +- **Task ID determinism vs edits:** Because id is based on initial content + sid, it won't change during edits. If you attempt to re-add the same task, you'll get the same id; treat it as idempotent add. +- **Orphaned indexes:** Always mutate doc + index keys in the same txn; on startup or rarely, you can offer a "repair" command to reconcile status indexes, but careful txns should make this unnecessary. +- **Path hashing collisions:** With blake3 and 8+ bytes hex, collision risk is negligible; still store full dir_path in meta for verification. +- **Subscription storms:** Subscribe narrowly (per-session) and debounce UI redraws. + +## When to consider the advanced path + +- **Thousands of tasks/events per session and UI becomes slow scanning JSON:** + - Switch task/event values to a compact binary encoding (e.g., flatbuffers/msgpack) to reduce I/O. +- **Cross-session global event feed (optional):** + - Maintain a global index per event: idx/events/{ts_be}/{sid}/{seq_be} -> "" pointing to s/{sid}/evt/{seq_be}. + - Pros: simple "recent activity" TUI across sessions, single subscription to idx/events/, efficient time-range scans. + - Cons: extra write per event (write amplification), hot global prefix under high throughput, time-ordering relies on event timestamps (minor clock skew acceptable), large histories require pagination/retention. +- **Need ordering of tasks:** + - Add s/{sid}/idx/task_order/{seq_be} -> {task_id} and maintain it on inserts/moves. +- **Multi-tenant or remote DB:** + - Add tenant prefixes or shard DBs; preserve the same key layout. + +## Optional advanced path (brief) + +- Use ULID for sids to naturally order sessions by creation time (helps chronological archive listings without extra metadata). +- Maintain s/{sid}/stats -> {pending, in_progress, …, updated_at} and update counts transactionally on task status changes; lets the session list show counts without reading all tasks. +- Add CAS-like guards using a version field inside task JSON (optimistic concurrency within your app domain), rejecting stale updates gracefully. + +## Summary + +This schema gives you: +- O(1) active-session lookup by working directory. +- O(#tasks with status) retrieval for status-filtered views via a single prefix iteration. +- Stable, strictly-ordered per-session event logs. +- Simple, efficient prefixes to subscribe for real-time TUI updates. +- Clean archival with preserved queryability and chronological browsing.