model.md

  1# Data model
  2
  3If you are not familiar with [git internals](https://git-scm.com/book/en/v1/Git-Internals),
  4you might first want to read about them,
  5as the `git-bug` data model is built on top of them.
  6
  7The biggest problem when creating a distributed bug tracker is that there is no central authoritative server (doh!). This implies some constraints.
  8
  9## Anybody can create and edit bugs at the same time as you
 10
 11To deal with this problem, you need a way to merge these changes in a meaningful way.
 12
 13Instead of storing directly the final bug data, we store a series of edit `Operation`. One of such operation could looks like this:
 14
 15```json
 16{
 17  "type": "SET_TITLE",
 18  "author": {
 19    "id": "5034cd36acf1a2dadb52b2db17f620cc050eb65c"
 20  },
 21  "timestamp": 1533640589,
 22  "title": "This title is better"
 23}
 24```
 25
 26Note: Json provided for readability. Internally it's a golang struct.
 27
 28These `Operation` are aggregated in an `OperationPack`, a simple array. An `OperationPack` represent an edit session of a bug. We store this pack in git as a git `Blob`, that is arbitrary serialized data.
 29
 30To reference our `OperationPack` we create a git `Tree`, that is a tree of reference (`Blob` of sub-`Tree`). If our edit operation include a media (for instance in a message), we can store that media as a `Blob` and reference it here under `"/media"`. 
 31
 32To complete the picture, we create a git `Commit` that reference our `Tree`. Each time we add more `Operation` to our bug, we add a new `Commit` with the same data-structure to form a chain of `Commit`.
 33
 34This chain of `Commit` is made available as a git `Reference` under `refs/bugs/<bug-id>`. We can later use this reference to push our data to a git remote. As git will push any data needed as well, everything will be pushed to the remote including the medias.
 35
 36For convenience and performance, each `Tree` reference the very first `OperationPack` of the bug under `"/root"`. That way we can easily access the very first `Operation`, the `CREATE` operation. This operation contains important data for the bug like the author.
 37
 38Here is the complete picture:
 39
 40```
 41 refs/bugs/<bug-id>
 42       |
 43       |
 44       |
 45 +-----------+          +-----------+             "ops"    +-----------+
 46 |  Commit   |---------->   Tree    |---------+------------|   Blob    | (OperationPack)
 47 +-----------+          +-----------+         |            +-----------+
 48       |                                      |
 49       |                                      |
 50       |                                      |   "root"   +-----------+ 
 51 +-----------+          +-----------+         +------------|   Blob    | (OperationPack)
 52 |  Commit   |---------->   Tree    |-- ...   |            +-----------+
 53 +-----------+          +-----------+         |
 54       |                                      |
 55       |                                      |   "media"  +-----------+        +-----------+
 56       |                                      +------------|   Tree    |---+--->|   Blob    | bug.jpg
 57 +-----------+          +-----------+                      +-----------+   |    +-----------+
 58 |  Commit   |---------->   Tree    |-- ...                                |
 59 +-----------+          +-----------+                                      |    +-----------+
 60                                                                           +--->|   Blob    | demo.mp4
 61                                                                                +-----------+
 62```
 63
 64Now that we have this, we can easily merge our bugs without conflict. When pulling bug's update from a remote, we will simply add our new operations (that is, new `Commit`), if any, at the end of the chain. In git terms, it's just a `rebase`.
 65
 66## You can't have a simple consecutive index for your bugs
 67
 68The same way git can't have a simple counter as identifier for it's commit as SVN do, we can't have consecutive identifiers for bugs.
 69
 70`git-bug` use as identifier the hash of the first commit in the chain of commit of the bug. As this hash is ultimately computed with the content of the `CREATE` operation that include title, message and a timestamp, it will be unique and prevent collision.
 71
 72The same way as git does, this hash is displayed truncated to a 7 characters string to human user. Note that when specifying a bug id in a command, you can enter as few character as you want as long as there is no ambiguity. If multiple bugs match your prefix, `git-bug` will complain and display the potential matches.
 73
 74## You can't rely on the time provided by other people (their clock might by off) for anything other than just display
 75
 76When in the context of a single bug, events are already ordered without the need of a timestamp. An `OperationPack` is an ordered array of operations. A chain of commit orders `OperationPack` with each other.
 77
 78Now, to be able to order bugs by creation or last edition time, `git-bug` use a [Lamport logical clock](https://en.wikipedia.org/wiki/Lamport_timestamps). A Lamport clock is a simple counter of event. When a new bug is created, its creation time will be the highest time value we are aware of plus one. This declare a causality in the event and allow to order bugs.
 79
 80When bugs are push/pull to a git remote, it might happen that bugs get the same logical time. This means that they were created or edited concurrently. In this case, `git-bug` will use the timestamp as a second layer of sorting. While the timestamp might be incorrect due to a badly set clock, the drift in sorting is bounded by the first sorting using the logical clock. That means that if users synchronize their bugs regularly, the timestamp will rarely be used, and should still provide a kinda accurate sorting when needed.
 81
 82These clocks are stored in the chain of commit of each bug, as entries in each main git `Tree`. The first commit will have both a creation time and edit time clock, while a later commit will only have an edit time clock. A naive way could be to serialize the clock in a git `Blob` and reference it in the `Tree` as `"create-clock"` for example. The problem is that it would generate a lot of blobs that would need to be exchanged later for what is basically just a number.
 83
 84Instead, the clock value is serialized directly in the `Tree` entry name (for example: `"create-clock-4"`). As a Tree entry need to reference something, we reference the git `Blob` with an empty content. As all of these entries will reference the same `Blob`, no network transfer is needed as long as you already have any bug in your repository.
 85
 86
 87Example of Tree of the first commit of a bug:
 88```
 89100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	create-clock-14
 90100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	edit-clock-137
 91100644 blob a020a85baa788e12699a4d83dd735578f0d78c75	ops
 92100644 blob a020a85baa788e12699a4d83dd735578f0d78c75	root 
 93```
 94Note that both `"ops"` and `"root"` entry reference the same OperationPack as it's the first commit in the chain.
 95
 96
 97Example of Tree of a later commit of a bug:
 98```
 99100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	edit-clock-154
100100644 blob 68383346c1a9503f28eec888efd300e9fc179ca0	ops
101100644 blob a020a85baa788e12699a4d83dd735578f0d78c75	root
102```
103Note that the `"root"` entry still reference the same root OperationPack. Also, all the clocks reference the same empty `Blob`.