CIP-69: Batched Anchor Data Structure
Author | Joel Thorstensson |
---|---|
Discussions-To | https://github.com/ceramicnetwork/CIP/issues/69 |
Status | Final |
Category | Core |
Created | 2020-11-02 |
Table of Contents
Simple Summary
This CIP describes the data structure of a blockchain anchor that batches multiple document updates together.
Abstract
An anchor service accepts document update anchor request from clients and batches them together in a single merkle tree, which is then anchored into a specific blockchain. The anchor service also sorts all of the requests and creates a bloom filter which describes which documents where updated in this batch, which is also included in the blockchain transaction. This allows external observers to more effectively determine if a document which they are interested in was updated.
Motivation
There are two main motivations for this CIP. Firstly having an agreed upon spec for the structure of the batched anchor merkle tree makes it easier for all consumers of the ceramic protocol to know what to expect when parsing an anchor record. Secondly the sorting of updates and the inclusion of a bloom filter allows indexing services to effectively determine if an update was made which is relevant, and find this update in the tree effectively.
Specification
First lets define the basic IPLD data structures used.
type MerkleNode [Link]
type BloomMetadata struct {
type String
data {String:any}
}
type TreeMetadata struct {
numEntries Int
bloomFilter BloomMetadata
}
Here the MerkleNode is simply an array of CIDs, which is used as the main building block for the merkle tree. The TreeMetadata contains some metadata about the tree, namely the number of entries and a bloom filter. The bloomFilter.type
is used to describe which specific type of bloom filter is being used and is there for upgradeability purposes.
Tree structure
The structure of the merkle tree is very straight forward. Starting from the bottom each individual update is paired with another update and a MerkleNode is created which has the leftmost update in position 0
, and the other update at position 1
. MerkleNodes are then created on each level until we reach the root. The root MerkleNode has an additional entry at position 2
which simply points to a TreeMetadata object. In the figure below we see a simple tree with four leafs.
In this example an IPLD Path can be used to reach any of the leafs. For example, to reach Update 3 we would use the path <root-cid>/1/0
. Similarly we can always get the TreeMetadata using the path <root-cid>/2
.
The merkle tree should be constructed as a balanced tree. See the implementation for reference.
Leaf sorting
Before creating the tree the leafs (updates) should be sorted. The sorting should be based on a few different properties of the Ceramic document that is being updated:
family
- sort by the family in the document metadataschema
- if multiple documents have the same topic, sort by the schemacontrollers
- if multiple documents have the same topic and schema, sort by the first controller, then subsequent onesStreamID
- finally sort by the StreamID
Bloom filter
The bloom filter should include the following data for each document that is being updated:
-
The
family
, prepend each string withfamily-
- The first 5
tags
, prepend each tag string withtag-
- The
schema
, prepend the schema StreamID string withschema-
- All DID strings in the
controllers
array, prepend each DID withcontroller-
- The StreamID string of the document, prepend with
streamid-
The bloom filter is created using the javascript bloom-filters library. Specifically using the Classic Bloom Filter. An example for how to create the filter can be observed below.
const { BloomFilter } = require('bloom-filters')
const filterEntries = [...] // An array of strings
const errorRate = 0.0001 // .01 % error rate
const filter = BloomFilter.from(items, errorRate)
const exported = filter.saveAsJSON()
Now we can create the TreeMetadata object, which can be added into ipfs and the CID of which is added to position 2
in the root MerkleNode.
{
numEntries: <number-of-leafs>,
bloomFilter: { type: 'jsnpm_bloom-filters', data: exported }
}
Rationale
This CIP provides the most efficient merkle tree implementation possible in IPLD DAG-CBOR. The more interesting aspects of this CIP are however the sorting algorithm and the bloom filter implementation.
The sorting allows for more efficient lookup in the merkle tree, primarily based on the family of the updated Ceramic document. Further sorting is also provided by the additional properties. If an implementer knows the family of a document they can do a binary search though the tree to find the correct update.
The bloom filter allows an observer to efficiently determine if any of the given properties are most likely in the merkle tree. Since the filter itself is stored in a separate IPLD object the filter doesn’t add any overhead when verifying a merkle witness for a document update. The bloomfilter implementation that was chosen is currently only implemented in javascript, we failed to find an implementation that is compatible with multiple languages. However, since we introduce a type
property we can easily determine which implementation was used. This also allows us to change the implementation used in the future.
Backwards Compatibility
These changes should be fully backwards compatible with the current implementation of Ceramic.
Implementation
- Ceramic Anchor Service - Not started yet.
Security Considerations
The main consideration to take into account when implementing the Batched Anchor Data Structure is to have a limit on the number of leafs, i.e. tree depth. If the tree is too large it will become more expensive to verify the merkle witness for any individual update.
In the bloom filter we limit the tags
inclusion to the first 5 tags. The reason for this is simply to limit the size of the generated bloom filter.
Copyright
Copyright and related rights waived via CC0.
Citation
Please cite this document as:
Joel Thorstensson, "CIP-69: Batched Anchor Data Structure," Ceramic Improvement Proposals, no. 69, November 2020. [Online serial]. Available: https://github.com/ceramicnetwork/CIPs/blob/main/CIPs/cip-69.md