Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tiger: implement TTH algorithm #494

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Conversation

baodrate
Copy link

@baodrate baodrate commented Jul 4, 2023

Implements the Tiger Tree Hash algorithm as mentioned here.

I'm a rust newbie, so implementation (and general Rust) advice is welcome.

The digest crate doesn't seem to be happy with block sizes > 128, but TTH operates on 1024K data blocks, so I emulated it internally. Doesn't feel like the best way to do this but I couldn't figure out a more elegant method.

I didn't attempt any optimization so performance isn't great (although much better than I expected), approx 20-30% slower than https://github.com/rhash/RHash.

Copy link
Member

@newpavlov newpavlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked into the implementation and the algorithm itself yet, so for now I have only the following comments.

IIUC the main reason to use TTH is to be able to parallelize processing of data blocks. It may be worth to use rayon for it. Also I think it should be possible to implement this algorithm without storing all leaves, i.e. you could collapse the tree and only store number of nodes proportional to the tree height.

tiger/src/lib.rs Outdated
@@ -33,6 +33,8 @@
#![forbid(unsafe_code)]
#![warn(missing_docs, rust_2018_idioms)]

extern crate alloc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If TTH can not be implemented without allocations, then it should be feature-gated, i.e. you should introduce a new feature in Cargo.toml (e.g. tree-hash) and add #[cfg(feature = "tree-hash")] where appropriate.


#[test]
fn tth_two_blocks() {
let content = hex!("deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to write it like this:

let s = hex!("deadbeef");
let mut content = Vec::with_capacity(s.len() * N);
for _ in 0..N {
    content.extend(&s);
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's embarrassing, I meant to clean that up before submitting the PR. Will do

@baodrate
Copy link
Author

baodrate commented Jul 5, 2023

I think it should be possible to implement this algorithm without storing all leaves, i.e. you could collapse the tree and only store number of nodes proportional to the tree height.

That's a good point. I opted for the easier implementation to get some feedback, but I think the constant-memory version shouldn't be too hard. I would have to give it a input limit. Assuming 2^64B is a reasonable size limit, that's gives a max tree height of 54 (with 1024B blocks) and a minimum space requirement of Tiger::OutputSize * 54 ~= 1.3KiB. Which seems reasonable?

IIUC the main reason to use TTH is to be able to parallelize processing of data blocks. It may be worth to use rayon for it.

I'm a little unsure about this suggestion. Do you mean this library should depend on rayon? Would multi-threading behavior be configurable by the user? Or is this something we would allow the user to handle by e.g. exposing hash_nodes, so the user can calculate the partial trees in parallel and pass the collection of hashes to hash_nodes to get the hash of the full tree?

@newpavlov
Copy link
Member

I think the constant-memory version shouldn't be too hard.

You can continue to use dynamic allocation, the point is to reduce amount of used memory.

Do you mean this library should depend on rayon?

Yes, but rayon can be optional, disabled-by-default dependency.

Would multi-threading behavior be configurable by the user?

No need for separate configuration knobs, rayon defaults should be fine.

BTW, can you add link(s) for TTH test vector source?

@baodrate
Copy link
Author

baodrate commented Jul 5, 2023

You can continue to use dynamic allocation, the point is to reduce amount of used memory

I've updated the algorithm to reduce memory usage. I've also swapped out Vec for arrayvec as per my previous comment. If this isn't acceptable I can switch back to using Vec.

BTW, can you add link(s) for TTH test vector source?

TTH tests were calculated by hand by hashing individual blocks with the tiger implementation in this repo, and confirmed with rhash and tthsum. I can look for more authoritative tests. But TTH doesn't seem to be well-documented (rather, it seems to be the consensus of various P2P file-sharing software).

Instead of storing every leaf and processing the tree once at the end,
process the tree every time a leaf is added, so that a maximum of N
nodes need to be stored, for a tree of height N. Then a final walk of
the nodes is performed at the end to promote and merge any single-child
leaves.
@baodrate
Copy link
Author

baodrate commented Jul 5, 2023

I gave integrating rayon a shot, but I couldn't get a measurable difference in performance. edit: I've profiled the hasher and it does seem like there should be room for rayon to do its thing, I just can't seem to get it to parallelize properly. Will keep working on it

note to self: the blake3 crate (https://github.com/BLAKE3-team/BLAKE3) implements digest and supports (optionally) rayon multithreading

edit: added rayon-multithreading modeled after the blake3 crate (-SIMD stuff) and I still can't beat the performance of the naive implementation. I'm so bad at this dudes 😭

TTH/reference/TigerTreeCore
	time:   [3.7648 µs 3.7824 µs 3.8038 µs]
	thrpt:  [514.47 MiB/s 517.37 MiB/s 519.81 MiB/s]
TTH/serial/compress_subtree<Serial>
	time:   [3.8919 µs 3.9154 µs 3.9416 µs]
	thrpt:  [496.48 MiB/s 499.81 MiB/s 502.83 MiB/s]
TTH/rayon/compress_subtree<Rayon>/1
	time:   [20.298 µs 20.656 µs 21.018 µs]
	thrpt:  [93.110 MiB/s 94.742 MiB/s 96.408 MiB/s]
TTH/rayon/compress_subtree<Rayon>/2
	time:   [14.996 µs 15.198 µs 15.420 µs]
	thrpt:  [126.91 MiB/s 128.77 MiB/s 130.49 MiB/s]
TTH/rayon/compress_subtree<Rayon>/4
	time:   [3.6958 µs 3.7084 µs 3.7219 µs]
	thrpt:  [525.79 MiB/s 527.71 MiB/s 529.51 MiB/s]
TTH/rayon/compress_subtree<Rayon>/8
	time:   [3.7162 µs 3.7411 µs 3.7685 µs]
	thrpt:  [519.29 MiB/s 523.10 MiB/s 526.59 MiB/s]
TTH/rayon/compress_subtree<Rayon>/16
	time:   [3.6799 µs 3.6958 µs 3.7153 µs]
	thrpt:  [526.73 MiB/s 529.50 MiB/s 531.80 MiB/s]

@baodrate baodrate marked this pull request as draft July 18, 2023 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants