-
Notifications
You must be signed in to change notification settings - Fork 104
Use JS-IPFS to deliver ImageNet (and other datasets, but first ImageNet) #417
Comments
Hi Ryan! We are in the process of downloading, un'tar'ing and pinning it to IPFS nodes as I write this post. I hope to have for you tomorrow (timezones) the hash of the dataset + a visualization using graphmd (e.g https://ipfs.io/docs/examples/example-viewer/example#../graphmd/README.md) for fun :). |
David, where'd you end up posting these?
…On Thu, Mar 22, 2018, 14:18 David Dias ***@***.***> wrote:
Hi Ryan! We are in the process of downloading, un'tar'ing and pinning it
to IPFS nodes as I write this post. I hope to have for you tomorrow
(timezones) the hash of the dataset + a visualization using graphmd (e.g
https://ipfs.io/docs/examples/example-viewer/example#../graphmd/README.md)
for fun :).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#417 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAcnZ1NspVzZZXlc9vF9v0XpFM3lPURks5thBUmgaJpZM4S3Oix>
.
|
I missed to send an update here. Here it goes. We have successfully added and pinned ImageNet to IPFS. The following CID (Content Identifier) has it Important note: Because the imageNet collecting is a very large set of files inside each directory, we had to use Unixfs Sharding (our term to explain how to shard directory links in a IPLD graph). You need to enable this feature on your IPFS node by setting the flag on the config to true.
When running On my TODO is fixing the same issue for js-ipfs ipfs/js-ipfs#1279. I plan to fix this and then to create a tiny tutorial for js-ipfs on how to explore this dataset using it. Important note2: You will realize that fetching imageNet isn't currently the fastest thing in the world. The reason behind that is because Bitswap (the block exchange) has no way to signal the intent to fetch an entire graph, instead, it keeps asking for more nodes as it traverses the graph. Since imageNet is by itself lots of files and folders and given that we had to shard, it makes it a very giant graph. A solution for this problem is in the works and we call it IPLD Selectors. @vmx and @Stebalien are working on this. |
Thanks, David
I'm trying to use go-ipfs to navigate this object and it appears to just
hang. This is after enabling sharding and running `ipfs swarm connect ...`
(which succeeded). Running go-ipfs 0.4.13. I tried ipfs la, ipfs object
get, ipfs object links, ipfs dag resolve. All just hung, nothing
interesting in the IPFS daemon stderr.
I also can't load the CID via gateway.ipfs.io.
Is it possible to navigate this dataset via go-ipfs at all? Perhaps I'm
doing something obviously wrong?
…On Mon, Mar 26, 2018, 13:59 David Dias ***@***.***> wrote:
I missed to send an update here. Here it goes.
We have successfully added and pinned ImageNet to IPFS. The following CID
(Content Identifier) has it QmXNHWdf9qr7A67FZQTFVb6Nr1Vfp4Ct3HXLgthGG61qy1
and it is being pinned by this node /ip4/
145.239.144.121/tcp/4001/ipfs/QmY9BWdiEn43iNx6nJEvgwrioJUYPQnUAoqvqKVRjavK4h,
you can do a direct ipfs swarm connect to it to speed up the discovery.
*Important note:* Because the imageNet collecting is a very large set of
files inside each directory, we had to use Unixfs Sharding (our term to
explain how to shard directory links in a IPLD graph). You need to enable
this feature on your IPFS node by setting the flag on the config to true.
// on ~/.ipfs/config, change
// ...
"Experimental": {
"FilestoreEnabled": false,
"ShardingEnabled": true, // this is false by default, you need to set it to true
"Libp2pStreamMounting": false
}
When running ipfs get for a sharded directory, we found a issue that has
been fixed on go-ipfs master thanks to @Stebalien
<https://github.com/Stebalien> -- ipfs/kubo#4871
<ipfs/kubo#4871>. It will be released in the
next go-ipfs release. Meanwhile you can compile from Master.
On my TODO is fixing the same issue for js-ipfs ipfs/js-ipfs#1279
<ipfs/js-ipfs#1279>. I plan to fix this and
then to create a tiny tutorial for js-ipfs on how to explore this dataset
using it.
*Important note2:* You will realize that fetching imageNet isn't
currently the fastest thing in the world. The reason behind that is because
Bitswap (the block exchange) has no way to signal the intent to fetch an
entire graph, instead, it keeps asking for more nodes as it traverses the
graph. Since imageNet is by itself lots of files and folders and given that
we had to shard, it makes it a very giant graph. A solution for this
problem is in the works and we call it IPLD Selectors. @vmx
<https://github.com/vmx> and @Stebalien <https://github.com/Stebalien>
are working on this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#417 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAcnWv1dEKegPonT3PYWFo2A-XGJzNSks5tiVbNgaJpZM4S3Oix>
.
|
@ajbouh see my notes above. You need to use latest go-ipfs master to have access to the fix @Stebalien shipped. Here is a screenshot of it working. The gateways can't access it yet until it go-ipfs 0.4.15 has been released and we have updated them. |
I see, I took the title of that PR too literally.
It appears that `ipfs ls` doesn't support directory sharding at all. This
gist shows an example of how I manually sharded the ImageNet validation
set: https://gist.github.com/ajbouh/80cf1e2c87fff205283e527568da4ce4
Will an IPFS-directory-sharded version of ImageNet will be performant any
time soon? Perhaps it would be better to shard it by hand like I've done in
the validation set so that existing tools like `ipfs ls` will work today?
…On Mon, Mar 26, 2018, 15:15 David Dias ***@***.***> wrote:
@ajbouh <https://github.com/ajbouh> see my notes above. You need to use
latest go-ipfs master to have access to the fix @Stebalien
<https://github.com/Stebalien> shipped. Here is a screenshot of it
working.
[image: image]
<https://user-images.githubusercontent.com/1211152/37935968-76daf978-3108-11e8-9dc5-8c0be46ca496.png>
The gateways can't access it yet until it go-ipfs 0.4.15 has been released
and we have updated them.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#417 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAcnZEx2jzkIMQpg-ep4-NK1Pz29bRLks5tiWiSgaJpZM4S3Oix>
.
|
CIFAR10, MNIST, and other smaller datasets can be downloaded directly from propelml.org
https://github.com/ipfs/js-ipfs
Lots to be worked out here. Opening this as an umbrella issue.
cc @diasdavid @ajbouh
The text was updated successfully, but these errors were encountered: