Historical storage endgame bottleneck: how far can we push?

May 28th, 2022

The endgame bottleneck for data layers is going to be historical storage. You can read more about it in detail here. Since then, Vitalik has some comments on the matter:

At much higher levels of history storage (eg. 500 TB per year), the risk that some data will be forgotten becomes higher (and additionally, the data availability verification system becomes more strained). This is likely the true limit of sharded blockchain scalability. However, all current proposed parameters are very far from reaching this point.

The current EIP-4844 spec calls for 2.5 TB/year historical storage, while for preliminary danksharding spec this is 42 TB/year. DataLayr, a DA layer based on danksharding, is already targeting 315 TB/year on testnet, and even mentions with 10,000 nodes it can get up to 31.5 PB/year. For context, what kind of throughput will this enable on rollups? Something in the order of magnitude of 10- 100 million TPS. Yeah, we are a long, long ways away from requiring this sort of throughput and the bottlenecks will be decidedly on the execution layers.

In this post, I’ll assume other bottlenecks (distributed builders, P2P, fast KZG proving, sampling etc.) are alleviated, and I’ll explore the feasibility of historical storage. Please note this is highly speculative - I haven’t read anything on the matter before. Just another reminder: for historical storage you need only one copy of data to be safe.

Let’s start with danksharding. This is very simple. You can easily do this with a desktop PC or a consumer NAS. Today, it costs roughly $800 for >1 year of historical storage, including 3x16 TB hard drives + 3-bay NAS. By the time danksharding ships, I’d expect this to cost only $500, judging from trends of hard drive expansion (Wright’s Law) over time.

Let’s ramp things up. Our next target is 1 PB/year, which is 2x Vitalik’s risky threshold, and 3x DataLayr testnet, how difficult is it? This target is enough for millions of TPS on rollups. Turns out, even this is easier than you might think. Linus Tech Tips has an entertaining video about how they achieved it 5 years ago. Today, this is dramatically easier. Using enterprise grade hard drives + storage server + RAID5 redundancy, this costs less than $30,000. Expensive, to be sure, but well within the means of a crypto enthusiast (I mean, it costs less than JPEGs…), and trivially cheap for large infrastructure providers like Etherscan, Cloudflare, Alchemy, TheGraph etc. Do note that half of the cost is the storage server, which can be upgraded over time with higher capacity hard drives.

What about DataLayr’s 10,000 node target, enough for 100 million TPS on rollups? Now, this is definitely very difficult to achieve, and you’ll need expensive rack mounted storage servers. Nevertheless, it is possible, and can be achieved for less than $1 million today. As hard drives and servers get cheaper over time, this could be very much affordable if and when we actually need this type of massive scale.

But here’s a trick: you don’t need one entity to hold all historical storage. Just have a dedicated, incentivized history layer that’s separate to the consensus layer. It could be the same validators participating in both. For one, you can have an incentivized BitTorrent-like network where you just pay people to store a small portion of the data, with adequate redundancy. Today, even budget laptops ship with 1 TB storage, and 16 TB hard drives cost $200 for desktop PCs. Even if we go with our 1 PB/year target, you just need a few hundred people with desktop PCs to spend $200 on a hard drive every year for adequate redundancy and replicability guarantees. Indeed, Filecoin has successfully incentivized 16 Exabytes - which is 16,000x the yearly target above, and this is despite their terrible economic model and negligible usage. It’ll be trivial for an actually in-demand DA layer to incentivize this. As a reminder once again, this will enable enough throughput for millions of TPS.

The other trick is: in a rollup-centric world, the rollup/volition can simply be responsible for custodying their relevant data. Rollup/volition sequencers can be mandated to hold and proof-of-custody-ish all relevant data to their specific protocol. The DA layer does not need to bother about it at all, then.

For something as widely decentralized and conservative as Ethereum this may not be the ideal solution, because danksharding will be used for more than just rollups. But there can be two resolutions to this:

A secondary rollup-only track which gives no historical guarantees whatsoever, and will continue to scale up with sampling nodes, assuming there are no P2P/builder bottlenecks.
Let protocols like DataLayr, Celestia, zkPorter or Polygon Avail handle this overflow. To be sure, this will be significantly less secure, but some of the low-value overflow may not require that much security. DataLayer does not need to offer any historical guarantees, and the validiums/volitions settling on there will know this and mandate their own sequencers/validators to custody historical data.

The general takeaway here is that there will be a massive abundance of data availability within the next couple of years, and the bottlenecks will be squarely on the execution layer (i.e. rollups, volitions etc.). But before that, the ultimate bottleneck is the application layer - will we have enough novel applications that’ll require this type of hyper scale to begin with? Very much remains to be seen. For those who will point out the scale comes first - this is not quite true. Today, all rollups run with mostly empty blocks - and they are quite cheap nowadays too, e.g. at the time of writing this post, a token transfer on zkSync costs $0.02, and a swap $0.06. Swap is $0.12 on Optimism. Stuff like Immutable X has literally $0.00 NFT minting and trading. So, as an industry, we must heavily focus on applications right now, so by the time DA layers scale up massively, there would be some justification for it.

PS: Experimenting with Writing NFTs once again. Pledge to contribute all proceeds to Gitcoin.

Subscribe to polynya

Receive the latest updates directly to your inbox.

Mint this entry as an NFT to add it to your collection.

Verification

This entry has been permanently stored onchain and signed by its creator.

Arweave Transaction

AKJAvYxjTdYvXON…F3OazXXisZPpwgo

Author Address

0x429F9aDA43e9F34…681BB70Df808892

Content Digest

EpFL1V1amxa8maE…KZNMvReTDxpVoA4