Storing Data on Blockchain

Though we are experiencing crypto winter at the moment, with major coins devalued more than 80% in 2018, the underlying blockchain technology is still exciting. The blockchain provides a democratized trust, distributed and validation protocol that has already disrupted banking and financial services and is on the verge of overhauling other industries like healthcare, supply chain, HR and more.

Despite the hype and its promising future, blockchain still has its shortcomings, the issue of data storage is one of them. The transactions based on the POW consensus for bitcoin, Ethereum, and other cryptocurrencies are extremely slow and therefore not suitable for storage of large data. For example, the deployment of dApp Cryptokitties nearly crippled the Ethereum network

The main problem of storing data on a blockchain is the limitation of the amount of data we can store because of its protocol and the high transaction costs. As a matter of fact, a block in blockchain can store data from a few kilobytes to maybe a few megabytes. For example, the block size of the Bitcoin is only 1Mb. The block size limitation has a serious impact on the scalability of most cryptocurrencies and the bitcoin community is debating whether to increase the block size.

Another issue is the high cost of the transactions. Why is storing data on the blockchain so expensive? It is because the data has to be stored by every full node on the blockchain network. When storing data on the blockchain, we do pay the base price for the transaction itself plus an amount per byte we want to store. If smart contracts are involved, we also pay for the execution time of the smart contract. This is why even storing kilobytes of data on the blockchain can cost you a fortune.

Therefore, it is not viable to store large data files like images and videos on the blockchain. Is there a possible solution to solve the storage issue? Yes, there are quite a few solutions but the most promising one is IPFS.

What is IPFS?

IPFS or Interplanetary File System is an innovative open-source project created by the developers at Protocol Labs. It is a peer-to-peer filesharing system that aims to change the way information is distributed across a wide area network. IPFS has innovated some communication protocols and distributed systems and combine them to produce a unique file-sharing system.

The current HTTP client-server protocol is location-based addressing which faces some serious drawbacks. First of all, location-based addressing consumes a huge amount of bandwidth, and thus costs us a lot of money and time. On top of that, HTTP downloads a file from a single server at a time, which can be slow if the file is big. In addition, it faces single-point of failure. If the webserver is down or being hacked, you will encounter 404 Not Found error. Besides that, it also allows for powerful entities like the governments to block access to certain locations.

On the other hand, IPFS is a content-based addressing system. It is a decentralized way of storing files, similar to BitTorrent. In the IPFS network, every node stores a collection of hashed files. The user can refer to the files by their hashes. The process of storing a file on IPFS is by uploading the file to IPFS, store the file in the working directory, generate a hash for the file and his file will be available on the IPFS network. A user who wants to retrieve any of those files simply needs to call the hash of the file he or she wants. IPFS then search all the nodes in the network and deliver the file to the user when it is found.

IPFS will overcome the aforementioned HTTP weaknesses. As files are stored on the decentralized IPFS network, if a node is down, the files are still available on other nodes, therefore there is no single point of failure. Data transfer will be cheaper and faster as you can get the files from the nearest node. On top of that, it is almost impossible for the powerful entities to block access to the files as the network is decentralized.

The following figure shows the difference between the centralized client-server protocol(HTTP) and the peer-to-peer IPFS protocol.


 [Source: https://www.maxcdn.com/one/visual-glossary/interplanetary-file-system/]

Blockchain and IPFS

IPFS is the perfect match for the blockchain. As I have mentioned, the blockchain is inefficient in storing large amounts of data in a block because all the hashes need to be calculated and verified to preserve the integrity of the blockchain. Therefore, instead of storing data on the blockchain, we simply store the hash of the IPFS file. In this way, we only need to store a small amount of data that is required on the blockchain but get to enjoy the file storage and decentralized peer-to-peer properties of IPFS.

One of the real-world use cases of blockchain and IPFS is Nebulis. It is a new project exploring the concept of a distributed DNS that supposedly never fails under an overwhelming access request. Nebulis uses the Ethereum blockchain and the Interplanetary Filesystem (IPFS), a distributed alternative to HTTP, to register and resolve domain names. We shall see more integration of Blockchain and IPFS in the future.

References