← Will Donnelly

It would be really neat if it were possible to have a fully decentralized "cloud" ecosystem in which providers of computing resources could be made fully fungible. Sadly we'll need some staggering leaps in homomorphic computing before that can be done for CPU time.

However, there is no reason in principle that bulk data storage can't be done in a decentralized fashion. Encrypting the data on the client (and adding a MAC) means that no third party can access or tamper with the actual data, and storing redundant copies mitigates the concern that any one storage provider might disappear.

Proof of Storage

Let's start with a much simpler problem: How can I verify that the storage provider is actually storing my entire 1TiB upload, with a minimum of actual data transfer?

The answer turns out to be a nearly trivial challenge-response protocol. The data must be encrypted and MAC'd in smallish chunks, perhaps in the 4kiB to 64kiB range. The client can then periodically challenge the storage provider to return a specified chunk, and verify that the response payload is indeed the requested chunk.

Assuming that the client picks chunks to request at random, the storage provider can pass a given challenge with probability equal to the fraction F of the original uploaded chunks they actually stored. Repeating the challenge-response operation makes things exponentially harder (F^N) for only a linear increase in data transfer.

A Brief Sketch of Everything Else

The system only needs to interact with a "blockchain" insofar as there is payment being exchanged for storage services. The acts of discovering storage providers, uploading data to them, and retrieving data back should all take place "off the chain" anyway.

If clients pay any nonzero amount at upload time, there's a real risk that the market will be overrun by "storage providers" who never bother to store anything. It is possible that some sort of reputation mechanism could mitigate this, but of course then you have to worry about Sybil attacks on the reputation system itself.

If clients pay for retrieval things work out much better, however then the storage providers are being placed in the unenviable position of having to predict whether the client will actually ever retrieve their uploaded data in full. However there is some interesting potential in this vein for the system to serve dual purposes as a sort of "CDN" for publishing content as well.

If individual chunks of data are constructed such that third parties can verify a challenge-response pair (without access to the underlying plaintext of course) then it becomes possible to have some sort of "dispute resolution" process executed on a blockchain, however it must be biased towards the client since only the storage provider would be able to say "No, I totally still have the chunk, see here it is!" in response to a dispute.

Really it seems pretty obvious that a solid reputation system is the big piece missing from this design, but I have no idea how such a magical thing would even work.