Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/real rabin #1545

Merged
merged 2 commits into from
Aug 11, 2015
Merged

Feat/real rabin #1545

merged 2 commits into from
Aug 11, 2015

Conversation

whyrusleeping
Copy link
Member

This PR changes the chunking interface to allow it to return errors and also implements rabin chunking using a modified version of the implementation here: https://github.com/restic/chunker

@jbenet jbenet added the status/in-progress In progress label Jul 30, 2015
@rht
Copy link
Contributor

rht commented Jul 31, 2015

Is the block splitter to be used for parallel GetDag as well? (ah ic, GetBlocks already does every blocks concurrently)

@jbenet
Copy link
Member

jbenet commented Aug 2, 2015

@jbenet
Copy link
Member

jbenet commented Aug 2, 2015

soooooo excited for this.

@jbenet
Copy link
Member

jbenet commented Aug 2, 2015

@whyrusleeping we should drastically improve docker container space + transport usage after this lands :) ❤️ high priority.

@jbenet
Copy link
Member

jbenet commented Aug 7, 2015

This LGTM. trying it out

@jbenet
Copy link
Member

jbenet commented Aug 7, 2015

We should probably add a flag to the commands that use importer to try out rabin

@jbenet
Copy link
Member

jbenet commented Aug 7, 2015

still to do:

  • change the interface to take a min and max sizes. this avg thing is unwieldy.

@jbenet
Copy link
Member

jbenet commented Aug 7, 2015

it's not wired up anywhere yet...

if this is already changing all of the interfaces, we should go for the io.Reader one. just do:

r := bytes.NewReader(buf)
return r, nil

for now, but that way we dont have to change the interfaces a ton of times.

@whyrusleeping
Copy link
Member Author

the io.Reader interface made less and less sense the more i tried working it into the code. Every single place we would use it, we would just do an ioutil.ReadAll on the given reader, and assign those bytes to a dag.Node to be stored. and ReadAll overallocates memory so we dont want to do that.

@whyrusleeping
Copy link
Member Author

I added an option to add that lets us specify a chunker:

ipfs add -s rabin              # default rabin parameters (256k average size)
ipfs add -s rabin-1000         # rabin with 256k avg block size
ipfs add -s rabin-100-500-1000 # rabin with 100 min blocksize, 500 average, 1000 max
ipfs add -s size-4000          # standard size chunking, 4000 byte blocks

}
default:
return nil, fmt.Errorf("unrecognized chunker option: %s", chunker)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty cool, i like being able to specify options. should it be on the chunker or separate? i.e.:

ipfs add -s=rabin-1000-10000
ipfs add -s=rabin --rabin-min=1000 --rabin-max=10000

less typing, but also more clear.

i do like being able to express it in one string, but dont like the non-self-description. it may be worth expressing all chunckers in one string:

ipfs add -s=rabin-min:1000-max:10000

not sure. maybe the min: and max: parts could be optional?

btw, we can probably put the "string -> splitter with options" conversion in the splitter package, and use it in other tools. easy place to plug in other splitters over time.

also, "splitter" or "chunker" ? (last chance to rename). i like both.

@jbenet
Copy link
Member

jbenet commented Aug 8, 2015

@whyrusleeping could you:

  • consider making opts more expressive Feat/real rabin #1545 (you decide)
  • consider splitter vs chunker (you decide)
  • squash 408550e-3c0fbd7 as the tests dont pass before 3c0fbd7 ?

This LGTM and RFM! solid work! 👍

@jbenet
Copy link
Member

jbenet commented Aug 8, 2015

@whyrusleeping also:

  • rebase on latest master

License: MIT
Signed-off-by: Jeromy <[email protected]>

implement rabin fingerprinting as a chunker for ipfs

License: MIT
Signed-off-by: Jeromy <[email protected]>

vendor correctly

License: MIT
Signed-off-by: Jeromy <[email protected]>

refactor chunking interface a little

License: MIT
Signed-off-by: Jeromy <[email protected]>

work chunking interface changes up into importer

License: MIT
Signed-off-by: Jeromy <[email protected]>

move chunker type parsing into its own file in chunk

License: MIT
Signed-off-by: Jeromy <[email protected]>
@jbenet
Copy link
Member

jbenet commented Aug 8, 2015

@whyrusleeping
Copy link
Member Author

@jbenet oh, thats just the random orientation of the data causing bad chunking boundaries, i can stop that from being a failure.

@jbenet
Copy link
Member

jbenet commented Aug 11, 2015

@jbenet oh, thats just the random orientation of the data causing bad chunking boundaries, i can stop that from being a failure.

will randomness still make it fail sometimes?

@jbenet
Copy link
Member

jbenet commented Aug 11, 2015

@jbenet
Copy link
Member

jbenet commented Aug 11, 2015

LGTM RFM

whyrusleeping added a commit that referenced this pull request Aug 11, 2015
@whyrusleeping whyrusleeping merged commit e8d6f79 into master Aug 11, 2015
@jbenet jbenet removed the status/in-progress In progress label Aug 11, 2015
@whyrusleeping whyrusleeping deleted the feat/real-rabin branch August 11, 2015 15:15
hacdias pushed a commit to ipfs/boxo that referenced this pull request Jan 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants