Data model: Why not store an `OperationsPack` as a `Tree` instead of as a `Blob`?

Timeline

hoijui (hoijui) opened

(somewhat related to #226)

Is it for performance reasons?

Michael Muré (MichaelMure) commented

With git you can't store data in a Tree, you need a Blob for that. You might have seen that the data model abuse the Tree to store the Lamport clocks value directly into a filename, but even when doing that a Blob has to be attached. In this case, the same empty blob is used for all those entries.

Michael Muré (MichaelMure) closed the bug

hoijui (hoijui) commented

in model.md, it states that OperationPack is a go array, stored as serialized data in a single Blob in git. With this question I was meaning to ask, why not store each Operation as a Blob, and have the OperationsPack be a Tree of these Blobs?

hoijui (hoijui) commented

Now knowing that it is stored as JSON, the question would change to: why is the Pack stored as a single JSON Blob instead of a tree of blobs, with each 'Operation' being a separate Blob.

Michael Muré (MichaelMure) commented

in model.md, it states that OperationPack is a go array, stored as serialized data in a single Blob in git.

Ha yes. It's correct but misleading. It's a go array ... in memory. On disk, it's a serialized JSON.

With this question I was meaning to ask, why not store each Operation as a Blob, and have the OperationsPack be a Tree of these Blobs?

Performance and storage. Less git object means less work to store them or move them around in the network. Having one operation per Blob would lead to a lot of tiny objects. Having OperationPack to group them reduce that. Also, when commit signature is implemented, a single signature will be necessary instead of one per operation.

Michael Muré (MichaelMure) commented

Your solution would work as well, it's just a tradeoff.

hoijui (hoijui) commented

It's a go array ... in memory. On disk, it's a serialized JSON.

ook, thanks! :-)

Performance and storage. Less git object means less work to store them or move them around in the network. Having one operation per Blob would lead to a lot of tiny objects. Having OperationPack to group them reduce that. Also, when commit signature is implemented, a single signature will be necessary instead of one per operation.

ahhh that makes sense. if for example there would be ... one HTTP request for each Operation, instead of just per OperationPack, that could be quite some overhead. Thanks! :-)