Address
304 North Cardinal St.
Dorchester Center, MA 02124

Work Hours
Monday to Friday: 7AM - 7PM
Weekend: 10AM - 5PM

Storage and Blockchain

We are nearing an important moment in the future of the internet. In its current form you are viewing this post on a website, downloading the content from my server after looking up the domain name. Back in the early days of the web this was well and good, but as we enter what may even be the final form of the internet we cannot shackle ourselves to the methods that have worked up to now. Indeed, the internet’s very ethos is to constantly be evolving towards decentralization, towards more user control and anonymity, and above all universal treatment and access online. One of the major hurdles preventing this has been storage.

Before getting into storage we must consider what storage online actually is. For centuries all we had to preserve long forms of information were papers and stones. By applying a blemish to a perfect material we could contain information in such a way a future person could understand and derive meaning from it. Computers work exactly the same: you apply a blemish to a hard drive or alter the position of electrons in a solid state drive to allow meaning to be derived from it at a later date. People tend to think the internet is incorporeal when in fact it is every bit the same as all other information around them. The only difference between the Pony Express and email is we just send physical electrons or pulses of light from point A to point B. Those electron patterns are recorded in real space the same as you occupy it.

Having this under our belts let’s consider another thing: while certainly more information dense than a book, computers face the same problems (if not more) when it comes to storing that information. Much like the early days of books getting storage drives for computers is fairly expensive. Drives in their current form are prone to data loss and even outright breaking with as little as a decade of use. While certainly an improvement from a decade ago, at the time of writing most drives above a Terabyte of space are not even economically viable to purchase. All of these technological shortcomings are what make the internet so hostile to “hosts” and “hosting providers”. It’s why most people don’t want to run a website. It’s why most of the biggest platforms on the web remain unchallenged. The economics of scale are formidable.

Enter the technology of blockchain. At its core a blockchain is quite literally a group of computers that all share the same database and update it together. It is decentralized, scalable, and robust. Seems like the perfect fit right? Why have millions of servers hosting websites when you can host every single one of them on a collective server? Information eternally preserved and accessible free of censorship. Now that I have postulated such a concept and set up the current barriers technology affords us I can elaborate on how such a concept may come to be. Note our goal here is not temporary storage, but storage that is forever and that idea is what the following will be striving to achieve.

Considering how servers work in their present form, what would be a first possible step towards such a bold proposal? Enter the Inter-Planetary File System, also known as IPFS. Unlike the modern internet where content is location based (you search for an IP address), IPFS is content based (you search for a file hash). You can have a blockchain domain point to this hash and in essence have a fully decentralized website up and running instantly. This system is incredible because not only does it allow for content to be securely stored over tons of nodes, but also allows for content to be updated via versioning. However for the many benefits it may possess there are major drawbacks. The biggest one is incentive. IPFS is a protocol not a magical server. What incentive do you have to run a node if you know other people will? Another danger is the reality that a file still can disappear forever from the network if nobody runs a node with that file. This lack of incentive to operate compounded with the dangers of not at least hosting your own content via a node on the network makes for a glass cannon. You are better off running a centralized server than helping the network.

Even with these drawbacks the IPFS protocol is genius and most likely what will be used in the future, even if for centralized reasons. So how can we build on this in such a way to make it valuable? To start our first problem is incentive to host. Hosting requires both energy and storage devices. Outside of goodwill nobody would willingly help this network as they would be worse off for it. So how about we merge it with blockchain? Let’s consider this prospect. First, what if we incentivize our node operators by making a blockchain pay them to run nodes? One such idea is Filecoin. Filecoin seeks to solve the flaw of incentive by allowing clients to pay the network to host their files. This ensures that, so long as they are up on their payments, their file is always being hosting by the network. The shortcoming of this of course is that once someone quits paying the network, there is no incentive to keep the file and as such it will be dropped from existence.

Okay, so now what? While IPFS is cool, what about storing everything ON the blockchain? Funny enough this is a problem similar to the one many chains are going to reckon with someday in the not too distant future. The problem I’m referring to is decentralization and the entry cost of starting a node. At the time of writing the Ethereum network is adding roughly 200GB of new data that must be stored yearly. While that may not seem like a lot, take into account how expensive ETH is to transact with, how low global adoption is, and how this is before the ETH2.0 merge. Once the merge takes place the maximum transaction throughput will increase dramatically, as will the size of the network. So how could we store data forever using blockchain?

The answer of course is incentive. In order for the long term storage and preservation of the blockchain to be achieved the reward must be greater than the cost. So how about this: anyone can add information to the chain, but in order to do so must pay the network a hefty fee equal to ten years of regular cloud storage cost for the file size. This fee would be split among all the participating nodes of the network, along with a small cut for a DAO to support the project and onboard more people. To mitigate redundancy we fragment the file into thousands of pieces and store them on all nodes (much like IFPS). Unlike IFPS, once the file has been uploaded to the blockchain and it cannot be removed without kicking the node from the network. All nodes are given data to handle, and since they will be slowly paid for a decade of holding the file there is no an incentive to provide the service.

What about for later on after the file is no longer paying dividends? This is where the elusive technology variable comes into play. Storage has advanced a lot. Just over a decade ago people were flexing their name brand phones that had the massive storage that is 16GB. A decade before that nobody would have even known what to do with so much storage. A decade from the writing of this post storage in the single terabyte range will be considered uncomfortably small for daily life. Knowing this we could bet that the price to host more data will outpace the cost to host and maintain the previous data on the network.

On paper, this works. It would work for blockchains in the future, it could work for file storage, and it even may work for other applications. However, it has plenty of holes. Too many to make a sound argument for it. The biggest one being that we bet on the black horse of technological advancement to keep our incentive in place. While possible, it is unwise to bet on a variable. The true future of a fully decentralized web has not yet been realized. Blockchains are still new and niche. Technologies such as DNA based data storage are not yet consumer available. Electricity costs for storage still are too high. The list goes on. For at least the near future, the best solution I see is utilization of IFPS by a major organization (such as the Internet Archive and Wikipedia) for preserving long term access to data. By using IFPS their existing infrastructure would become much more robust and transparent while still being centralized. It would be a bridge over to web3 and more decentralization in the future. Will such organizations start moving to more robust protocols? Only time will tell.