Skip to content
This repository has been archived by the owner on May 30, 2019. It is now read-only.

Use JS-IPFS to deliver ImageNet (and other datasets, but first ImageNet) #417

Open
ry opened this issue Mar 22, 2018 · 6 comments
Open

Use JS-IPFS to deliver ImageNet (and other datasets, but first ImageNet) #417

ry opened this issue Mar 22, 2018 · 6 comments

Comments

@ry
Copy link
Contributor

ry commented Mar 22, 2018

CIFAR10, MNIST, and other smaller datasets can be downloaded directly from propelml.org
https://github.com/ipfs/js-ipfs

Lots to be worked out here. Opening this as an umbrella issue.

cc @diasdavid @ajbouh

@ry ry changed the title Use IPFS-JS to deliver ImageNet (and other datasets, but first ImageNet) Use JS-IPFS to deliver ImageNet (and other datasets, but first ImageNet) Mar 22, 2018
@daviddias
Copy link

Hi Ryan! We are in the process of downloading, un'tar'ing and pinning it to IPFS nodes as I write this post. I hope to have for you tomorrow (timezones) the hash of the dataset + a visualization using graphmd (e.g https://ipfs.io/docs/examples/example-viewer/example#../graphmd/README.md) for fun :).

@ajbouh
Copy link

ajbouh commented Mar 26, 2018 via email

@daviddias
Copy link

I missed to send an update here. Here it goes.

We have successfully added and pinned ImageNet to IPFS. The following CID (Content Identifier) has it QmXNHWdf9qr7A67FZQTFVb6Nr1Vfp4Ct3HXLgthGG61qy1 and it is being pinned by this node /ip4/145.239.144.121/tcp/4001/ipfs/QmY9BWdiEn43iNx6nJEvgwrioJUYPQnUAoqvqKVRjavK4h, you can do a direct ipfs swarm connect to it to speed up the discovery.

Important note: Because the imageNet collecting is a very large set of files inside each directory, we had to use Unixfs Sharding (our term to explain how to shard directory links in a IPLD graph). You need to enable this feature on your IPFS node by setting the flag on the config to true.

// on ~/.ipfs/config, change
// ...
"Experimental": {
  "FilestoreEnabled": false,
  "ShardingEnabled": true, // this is false by default, you need to set it to true
  "Libp2pStreamMounting": false
}

When running ipfs get for a sharded directory, we found a issue that has been fixed on go-ipfs master thanks to @Stebalien -- ipfs/kubo#4871. It will be released in the next go-ipfs release. Meanwhile you can compile from Master.

On my TODO is fixing the same issue for js-ipfs ipfs/js-ipfs#1279. I plan to fix this and then to create a tiny tutorial for js-ipfs on how to explore this dataset using it.

Important note2: You will realize that fetching imageNet isn't currently the fastest thing in the world. The reason behind that is because Bitswap (the block exchange) has no way to signal the intent to fetch an entire graph, instead, it keeps asking for more nodes as it traverses the graph. Since imageNet is by itself lots of files and folders and given that we had to shard, it makes it a very giant graph. A solution for this problem is in the works and we call it IPLD Selectors. @vmx and @Stebalien are working on this.

@ajbouh
Copy link

ajbouh commented Mar 26, 2018 via email

@daviddias
Copy link

daviddias commented Mar 26, 2018

@ajbouh see my notes above. You need to use latest go-ipfs master to have access to the fix @Stebalien shipped. Here is a screenshot of it working.

image

The gateways can't access it yet until it go-ipfs 0.4.15 has been released and we have updated them.

@ajbouh
Copy link

ajbouh commented Mar 27, 2018 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants