From 33c670271c4d02816f9052852bf802789d7a7f64 Mon Sep 17 00:00:00 2001 From: Robin Vobruba Date: Wed, 4 May 2022 12:01:22 +0200 Subject: [PATCH 1/7] model: strict Markdown requires empty lines before (and after) lists --- doc/model.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/doc/model.md b/doc/model.md index da76761c202324f56de462282d73c264b5fac488..f0d39f762f5cf08b1ae94424e2287c81db79e852 100644 --- a/doc/model.md +++ b/doc/model.md @@ -16,6 +16,7 @@ To get the final state of an entity, we apply these `Operation`s in the correct ## Entities are stored in git objects An `Operation` is a piece of data including: + - a type identifier - an author (a reference to another entity) - a timestamp (there is also 1 or 2 Lamport time that we will describe later) @@ -64,6 +65,7 @@ Here is the complete picture: It would be very tempting to use the `Operation`'s timestamp to give us the order to compile the final state. However, you can't rely on the time provided by other people (their clock might be off) for anything other than just display. This is a fundamental limitation of distributed system, and even more so when actors might want to game the system. Instead, we are going to use [Lamport logical clock](https://en.wikipedia.org/wiki/Lamport_timestamps). A Lamport clock is a simple counter of events. This logical clock gives us a partial ordering: + - if L1 < L2, L1 happened before L2 - if L1 > L2, L1 happened after L2 - if L1 == L2, we can't tell which happened first: it's a concurrent edition @@ -98,6 +100,7 @@ The same way as git does, this hash is displayed truncated to a 7 characters str ## Entities support conflict resolution Now that we have all that, we can finally merge our entities without conflict and collaborate with other users. Let's start by getting rid of two simple scenario: + - if we simply pull updates, we move forward our local reference. We get an update of our graph that we read as usual. - if we push fast-forward updates, we move forward the remote reference and other users can update their reference as well. @@ -106,6 +109,7 @@ The tricky part happens when we have concurrent edition. If we pull updates whil As we don't have a purely linear series of commits/`Operations`s, we need a deterministic ordering to always apply operations in the same order. git-bug apply the following algorithm: + 1. load and read all the commits and the associated `OperationPack`s 2. make sure that the Lamport clocks respect the DAG structure: a parent commit/`OperationPack` (that is, towards the head) cannot have a clock that is higher or equal than its direct child. If such a problem happen, the commit is refused/discarded. 3. individual `Operation`s are assembled together and ordered given the following priorities: @@ -115,6 +119,7 @@ git-bug apply the following algorithm: Step 2 is providing and enforcing a constraint over the `Operation`'s logical clocks. What that means is that we inherit the implicit ordering given by the DAG. Later, logical clocks refine that ordering. This, coupled with signed commit has the nice property of limiting how this data model can be abused. Here is an example of such an ordering. We can see that: + - Lamport clocks respect the DAG structure - the final `Operation` order is [A,B,C,D,E,F], according to those clocks @@ -124,4 +129,4 @@ When we have a concurrent edition, we apply a secondary ordering based on the `O ![merge scenario 2](merge2.png) -This secondary ordering doesn't carry much meaning, but it's unbiased and hard to abuse. \ No newline at end of file +This secondary ordering doesn't carry much meaning, but it's unbiased and hard to abuse. From 543e7b78f5eb443a80e0c4e8edd4a954646f9663 Mon Sep 17 00:00:00 2001 From: Robin Vobruba Date: Wed, 4 May 2022 12:03:49 +0200 Subject: [PATCH 2/7] model: Adds link explaining nounce (wikipedia) --- doc/model.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/model.md b/doc/model.md index f0d39f762f5cf08b1ae94424e2287c81db79e852..94528bc397be4595ef4dab307a3220f6ecac5891 100644 --- a/doc/model.md +++ b/doc/model.md @@ -21,7 +21,7 @@ An `Operation` is a piece of data including: - an author (a reference to another entity) - a timestamp (there is also 1 or 2 Lamport time that we will describe later) - all the data required by that operation type (a message, a status ...) -- a random nonce to ensure we have enough entropy, as the operation identifier is a hash of that data (more on that later) +- a random [nonce](https://en.wikipedia.org/wiki/Cryptographic_nonce) to ensure we have enough entropy, as the operation identifier is a hash of that data (more on that later) These `Operation`s are aggregated in an `OperationPack`, a simple array. An `OperationPack` represents an edit session of a bug. As the operation's author is the same for all the `OperationPack` we only store it once. From 2a0331e2ddc147b8616caa52d8a7c0432e56af1c Mon Sep 17 00:00:00 2001 From: Robin Vobruba Date: Wed, 4 May 2022 12:11:11 +0200 Subject: [PATCH 3/7] model: Moves example description after the example --- doc/model.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/doc/model.md b/doc/model.md index 94528bc397be4595ef4dab307a3220f6ecac5891..4cadfc9a8071cf03515f48dd53b26159fab76d5a 100644 --- a/doc/model.md +++ b/doc/model.md @@ -118,13 +118,15 @@ git-bug apply the following algorithm: Step 2 is providing and enforcing a constraint over the `Operation`'s logical clocks. What that means is that we inherit the implicit ordering given by the DAG. Later, logical clocks refine that ordering. This, coupled with signed commit has the nice property of limiting how this data model can be abused. -Here is an example of such an ordering. We can see that: +Here is an example of such an ordering: + +![merge scenario 1](merge1.png) + +We can see that: - Lamport clocks respect the DAG structure - the final `Operation` order is [A,B,C,D,E,F], according to those clocks -![merge scenario 1](merge1.png) - When we have a concurrent edition, we apply a secondary ordering based on the `OperationPack`'s identifier: ![merge scenario 2](merge2.png) From e652eb6f5b4a4349a24adbcd99088ed1ab690a8e Mon Sep 17 00:00:00 2001 From: Robin Vobruba Date: Wed, 4 May 2022 12:11:43 +0200 Subject: [PATCH 4/7] model: Links to a section further down --- doc/model.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/model.md b/doc/model.md index 4cadfc9a8071cf03515f48dd53b26159fab76d5a..afde4d84508439a48b50527e4d79d48c4353b9b8 100644 --- a/doc/model.md +++ b/doc/model.md @@ -19,7 +19,7 @@ An `Operation` is a piece of data including: - a type identifier - an author (a reference to another entity) -- a timestamp (there is also 1 or 2 Lamport time that we will describe later) +- a timestamp (there is also 1 or 2 [Lamport time](#time-is-unreliable)) - all the data required by that operation type (a message, a status ...) - a random [nonce](https://en.wikipedia.org/wiki/Cryptographic_nonce) to ensure we have enough entropy, as the operation identifier is a hash of that data (more on that later) From 00fb4bc098732bbeb0764b75355e64c835b4ad8a Mon Sep 17 00:00:00 2001 From: Robin Vobruba Date: Wed, 4 May 2022 12:16:40 +0200 Subject: [PATCH 5/7] model: Highlight some words with special meaning --- doc/model.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/model.md b/doc/model.md index afde4d84508439a48b50527e4d79d48c4353b9b8..cd3b6eb93db51069540f9663cdf56ed19d0f4e89 100644 --- a/doc/model.md +++ b/doc/model.md @@ -71,11 +71,11 @@ Instead, we are going to use [Lamport logical clock](https://en.wikipedia.org/wi - if L1 == L2, we can't tell which happened first: it's a concurrent edition -Each time we are appending something to the data (create an Entity, add an `Operation`) a logical time will be attached, with the highest time value we are aware of plus one. This declares a causality in the event and allows ordering entities and operations. +Each time we are appending something to the data (create an `Entity`, add an `Operation`) a logical time will be attached, with the highest time value we are aware of plus one. This declares a causality in the event and allows ordering entities and operations. -The first commit of an Entity will have both a creation time and edit time clock, while a later commit will only have an edit time clock. These clocks value are serialized directly in the `Tree` entry name (for example: `"create-clock-4"`). As a Tree entry needs to reference something, we reference the git `Blob` with an empty content. As all of these entries will reference the same `Blob`, no network transfer is needed as long as you already have any entity in your repository. +The first commit of an `Entity` will have both a creation time and edit time clock, while a later commit will only have an edit time clock. These clocks value are serialized directly in the `Tree` entry name (for example: `"create-clock-4"`). As a `Tree` entry needs to reference something, we reference the git `Blob` with an empty content. As all of these entries will reference the same `Blob`, no network transfer is needed as long as you already have any entity in your repository. -Example of Tree of the first commit of an entity: +Example of a `Tree` of the first commit of an entity: ``` 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 create-clock-14 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 edit-clock-137 @@ -83,7 +83,7 @@ Example of Tree of the first commit of an entity: ``` Note that both `"ops"` and `"root"` entry reference the same OperationPack as it's the first commit in the chain. -Example of Tree of a later commit of an entity: +Example of a `Tree` of a later commit of an entity: ``` 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 edit-clock-154 100644 blob 68383346c1a9503f28eec888efd300e9fc179ca0 ops From 9b871c6114c714e086f02f929a5a9bb6d33bdb91 Mon Sep 17 00:00:00 2001 From: Robin Vobruba Date: Wed, 4 May 2022 12:18:50 +0200 Subject: [PATCH 6/7] model: Removes now outdated statement about ops and root --- doc/model.md | 1 - 1 file changed, 1 deletion(-) diff --git a/doc/model.md b/doc/model.md index cd3b6eb93db51069540f9663cdf56ed19d0f4e89..8403e9c06da1b64c7e868ff971c85fd056e5f6a8 100644 --- a/doc/model.md +++ b/doc/model.md @@ -81,7 +81,6 @@ Example of a `Tree` of the first commit of an entity: 100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 edit-clock-137 100644 blob a020a85baa788e12699a4d83dd735578f0d78c75 ops ``` -Note that both `"ops"` and `"root"` entry reference the same OperationPack as it's the first commit in the chain. Example of a `Tree` of a later commit of an entity: ``` From 75ca2ce7da13f692a10ff15a70e594e9ca210e68 Mon Sep 17 00:00:00 2001 From: Robin Vobruba Date: Wed, 4 May 2022 12:19:50 +0200 Subject: [PATCH 7/7] model: Multiple, minor readability and language improvements --- doc/model.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/doc/model.md b/doc/model.md index 8403e9c06da1b64c7e868ff971c85fd056e5f6a8..de0de42adab36e0032146ff6395519c703f62399 100644 --- a/doc/model.md +++ b/doc/model.md @@ -3,19 +3,19 @@ Entities data model If you are not familiar with [git internals](https://git-scm.com/book/en/v1/Git-Internals), you might first want to read about them, as the `git-bug` data model is built on top of them. -## Entities (bugs, ...) are a series of edit operations +## Entities (bug, author, ...) are a series of edit operations -As entities are stored and edited in multiple process at the same time, it's not possible to store the current state like it would be done in a normal application. If two process change the same entity and later try to merge the states, we wouldn't know which change takes precedence or how to merge those states. +As entities are stored and edited in multiple processes at the same time, it's not possible to store the current state like it would be done in a normal application. If two processes change the same entity and later try to merge the states, we wouldn't know which change takes precedence or how to merge those states. To deal with this problem, you need a way to merge these changes in a meaningful way. Instead of storing the final bug data directly, we store a series of edit `Operation`s. This is a common idea, notably with [Operation-based CRDTs](https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type#Operation-based_CRDTs). ![ordered operations](operations.png) -To get the final state of an entity, we apply these `Operation`s in the correct order on an empty state to compute ("compile") our view. +To get the final state of an entity, we apply these `Operation`s in the correct order on an empty state, to compute ("compile") our view. ## Entities are stored in git objects -An `Operation` is a piece of data including: +An `Operation` is a piece of data, including: - a type identifier - an author (a reference to another entity) @@ -90,7 +90,7 @@ Example of a `Tree` of a later commit of an entity: ## Entities and Operation's ID -`Operation`s can be referenced in the data model or by users with an identifier. This identifier is computed from the `Operation`'s data itself, with a hash of that data: `id = hash(json(op))` +`Operation`s can be referenced - in the data model or by users - with an identifier. This identifier is computed from the `Operation`'s data itself, with a hash of that data: `id = hash(json(op))` For entities, `git-bug` uses as identifier the hash of the first `Operation` of the entity, as serialized on disk. @@ -98,24 +98,24 @@ The same way as git does, this hash is displayed truncated to a 7 characters str ## Entities support conflict resolution -Now that we have all that, we can finally merge our entities without conflict and collaborate with other users. Let's start by getting rid of two simple scenario: +Now that we have all that, we can finally merge our entities without conflict, and collaborate with other users. Let's start by getting rid of two simple scenarios: - if we simply pull updates, we move forward our local reference. We get an update of our graph that we read as usual. - if we push fast-forward updates, we move forward the remote reference and other users can update their reference as well. -The tricky part happens when we have concurrent edition. If we pull updates while we have local changes (non-straightforward in git term), git-bug create the equivalent of a merge commit to merge both branches into a DAG. This DAG has a single root containing the first operation, but can have branches that get merged back into a single head pointed by the reference. +The tricky part happens when we have concurrent editions. If we pull updates while we have local changes (non-straightforward in git term), git-bug creates the equivalent of a merge commit to merge both branches into a DAG. This DAG has a single root containing the first operation, but can have branches that get merged back into a single head pointed by the reference. As we don't have a purely linear series of commits/`Operations`s, we need a deterministic ordering to always apply operations in the same order. -git-bug apply the following algorithm: +git-bug applies the following algorithm: 1. load and read all the commits and the associated `OperationPack`s -2. make sure that the Lamport clocks respect the DAG structure: a parent commit/`OperationPack` (that is, towards the head) cannot have a clock that is higher or equal than its direct child. If such a problem happen, the commit is refused/discarded. +2. make sure that the Lamport clocks respect the DAG structure: a parent commit/`OperationPack` (that is, towards the head) cannot have a clock that is higher or equal than its direct child. If such a problem happens, the commit is refused/discarded. 3. individual `Operation`s are assembled together and ordered given the following priorities: 1. the edition's lamport clock if not concurrent 2. the lexicographic order of the `OperationPack`'s identifier -Step 2 is providing and enforcing a constraint over the `Operation`'s logical clocks. What that means is that we inherit the implicit ordering given by the DAG. Later, logical clocks refine that ordering. This, coupled with signed commit has the nice property of limiting how this data model can be abused. +Step 2 is providing and enforcing a constraint over the `Operation`'s logical clocks. What that means, is that we inherit the implicit ordering given by the DAG. Later, logical clocks refine that ordering. This - coupled with signed commits - has the nice property of limiting how this data model can be abused. Here is an example of such an ordering: @@ -126,7 +126,7 @@ We can see that: - Lamport clocks respect the DAG structure - the final `Operation` order is [A,B,C,D,E,F], according to those clocks -When we have a concurrent edition, we apply a secondary ordering based on the `OperationPack`'s identifier: +When we have concurrent editions, we apply a secondary ordering, based on the `OperationPack`'s identifier: ![merge scenario 2](merge2.png)