Data Availability Sampling
Last updated
Last updated
Danksharding proposes a solutionโData Availability Sampling (DAS) to reduce node burdens while ensuring data availability.
The idea behind Data Availability Sampling (DAS) is to fragment the data in the Blob into data shards, transforming nodes from downloading entire Blob data to randomly sampling Blob data shards. This approach scatters Blob data shards across every node in Origins, while the complete Blob data is stored in the entire Origins ledger, assuming a decentralized and sufficiently numerous node network.
For example, if Blob data is divided into 10 shards and there are 100 nodes in the network, each node will randomly sample and download one data shard, submitting the sampled shard's identifier to the block. As long as a block contains all the identifiers needed to reconstruct the Original data, Origins assumes the Blob data is available. However, there is an extremely low probability that none of the 100 nodes sampled a specific shard's identifier, resulting in data loss and a slight reduction in security, which is considered acceptable from a probabilistic standpoint.
Danksharding employs two technologies to implement Data Availability Sampling (DAS): Erasure Coding and KZG Polynomial Commitment.
5.3.1 Erasure Coding
Erasure Coding is an error-correcting encoding technique that, when applied to data slicing, allows all Origins nodes to reconstruct the Original data with just over 50% of the data fragments. This significantly reduces the probability of data loss. The implementation details are complex, but here's a brief explanation using a mathematical formula as an example: [2]
- Start by constructing a function f(x) = ax + b, and arbitrarily choose four x values.
- Set m = f(0) = b and n = f(1) = a + b. This leads to a = n โ b and b = m.
- Define p = f(2) and q = f(3), resulting in p = 2a + b = 2n โ m and q = 3a + b = 3n โ 2m.
- Scatter the m, n, p, and q fragments across the network nodes.
- According to the mathematical formula, finding any two fragments allows the calculation of the other two.
- If n and m are found, q = 3n - 2m and p = 2n - m can be directly computed.
- If q and p are found, subtracting (2p = 4n - 2m) - (q = 3n - 2m) yields 2p - q = n, and then m can be calculated directly.
In simple terms, Erasure Coding leverages mathematical principles to split Blob data into numerous fragments. Origins nodes don't need to collect all data fragments; having just over 50% is sufficient to reconstruct the Original Blob data. This greatly reduces the probability of insufficient fragment collection, making the probability negligible.
5.3.2 KZG Commitment
KZG Commitment (KZG) is a cryptographic technique used to address the data integrity concerns of Erasure Coding. Since nodes only sample the data fragments created by Erasure Coding, and they don't know if these fragments genuinely Originate from the Original Blob data, the role responsible for encoding needs to generate a KZG polynomial commitment. This commitment serves as proof that the Erasure Coding data fragment indeed belongs to a part of the Original data in Blob. KZG plays a role somewhat similar to a Merkle tree but with a different structure, and all proofs in KZG are related to the same polynomial.
Danksharding achieves Data Availability Sampling (DAS) through Erasure Coding and KZG Commitment, substantially reducing node burden while expanding Blob's additional data capacity to 16MB~32MB. The Origins community has also proposed a scheme called the 2D KZG scheme to further cut data fragments and reduce bandwidth and computational requirements. However, the specific algorithms, including the design of DAS, are still under active discussion and refinement within the community.