
IPFS - Content Addressed, Versioned, P2P File System
(DRAFT 3)
Juan Benet
juan@benet.ai
ABSTRACT
The InterPlanetary File System (IPFS) is a peer-to-peer dis-
tributed file system that seeks to connect all computing de-
vices with the same system of files. In some ways, IPFS
is similar to the Web, but IPFS could be seen as a sin-
gle BitTorrent swarm, exchanging objects within one Git
repository. In other words, IPFS provides a high through-
put content-addressed block storage model, with content-
addressed hyper links. This forms a generalized Merkle
DAG, a data structure upon which one can build versioned
file systems, blockchains, and even a Permanent Web. IPFS
combines a distributed hashtable, an incentivized block ex-
change, and a self-certifying namespace. IPFS has no single
point of failure, and nodes do not need to trust each other.
1. INTRODUCTION
There have been many attempts at constructing a global
distributed file system. Some systems have seen signifi-
cant success, and others failed completely. Among the aca-
demic attempts, AFS [6] has succeeded widely and is still
in use today. Others [7, ?] have not attained the same
success. Outside of academia, the most successful systems
have been peer-to-peer file-sharing applications primarily
geared toward large media (audio and video). Most no-
tably, Napster, KaZaA, and BitTorrent [2] deployed large
file distribution systems supporting over 100 million simul-
taneous users. Even today, BitTorrent maintains a massive
deployment where tens of millions of nodes churn daily [16].
These applications saw greater numbers of users and files dis-
tributed than their academic file system counterparts. How-
ever, the applications were not designed as infrastructure to
be built upon. While there have been successful repurpos-
ings
1
, no general file-system has emerged that offers global,
low-latency, and decentralized distribution.
Perhaps this is because a “good enough” system for most
use cases already exists: HTTP. By far, HTTP is the most
successful “distributed system of files” ever deployed. Cou-
pled with the browser, HTTP has had enormous technical
and social impact. It has become the de facto way to trans-
mit files across the internet. Yet, it fails to take advantage
of dozens of brilliant file distribution techniques invented in
the last fifteen years. From one prespective, evolving Web
infrastructure is near-impossible, given the number of back-
wards compatibility constraints and the number of strong
1
For example, Linux distributions use BitTorrent to trans-
mit disk images, and Blizzard, Inc. uses it to distribute
video game content.
parties invested in the current model. But from another per-
spective, new protocols have emerged and gained wide use
since the emergence of HTTP. What is lacking is upgrading
design: enhancing the current HTTP web, and introducing
new functionality without degrading user experience.
Industry has gotten away with using HTTP this long be-
cause moving small files around is relatively cheap, even for
small organizations with lots of traffic. But we are enter-
ing a new era of data distribution with new challenges: (a)
hosting and distributing petabyte datasets, (b) computing
on large data across organizations, (c) high-volume high-
definition on-demand or real-time media streams, (d) ver-
sioning and linking of massive datasets, (e) preventing ac-
cidental disappearance of important files, and more. Many
of these can be boiled down to “lots of data, accessible ev-
erywhere.” Pressed by critical features and bandwidth con-
cerns, we have already given up HTTP for different data
distribution protocols. The next step is making them part
of the Web itself.
Orthogonal to efficient data distribution, version control
systems have managed to develop important data collabo-
ration workflows. Git, the distributed source code version
control system, developed many useful ways to model and
implement distributed data operations. The Git toolchain
offers versatile versioning functionality that large file distri-
bution systems severely lack. New solutions inspired by Git
are emerging, such as Camlistore [?], a personal file stor-
age system, and Dat [?] a data collaboration toolchain
and dataset package manager. Git has already influenced
distributed filesystem design [9], as its content addressed
Merkle DAG data model enables powerful file distribution
strategies. What remains to be explored is how this data
structure can influence the design of high-throughput ori-
ented file systems, and how it might upgrade the Web itself.
This paper introduces IPFS, a novel peer-to-peer version-
controlled filesystem seeking to reconcile these issues. IPFS
synthesizes learnings from many past successful systems.
Careful interface-focused integration yields a system greater
than the sum of its parts. The central IPFS principle is
modeling all data as part of the same Merkle DAG.
2. BACKGROUND
This section reviews important properties of successful
peer-to-peer systems, which IPFS combines.
2.1 Distributed Hash Tables
Distributed Hash Tables (DHTs) are widely used to coor-
dinate and maintain metadata about peer-to-peer systems.
评论