Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Fix `from_str` overflow rounding issue The culprit is where `maybe_round` prevented the number from overflowing , and ended up returning an `Ok` in the end. The fix is to round a number only after we have passed the decimal point. * Remove coeff ArrayVec from `parse_str_radix_10` This commit flattened add_by_internal noting the fact that it is only ever called with a fixed size array plus one u32. In addition, instead of accumulating the coefficients in `ArrayVec` and process it later, we try to compute the output number in one pass. This also removed the manual overflow rounding on the `coeff` array and replacing with a plain arithmetic +1. cargo bench --bench lib_benches -- decimal_from_str Baseline: decimal_from_str ... bench: 566 ns/iter (+/- 21) [M1 Pro] decimal_from_str ... bench: 554 ns/iter (+/- 21) [Ryzen 3990x] Current: decimal_from_str ... bench: 193 ns/iter (+/- 1) [M1 Pro] decimal_from_str ... bench: 320 ns/iter (+/- 3) [Ryzen 3990x] * Use 128 bit integer to speed up `parse_str_radix_10` Instead of manually doing 96-bit integer operations, use a 128 bit integer and check for overflowing 96 bits worth of state cargo bench --bench lib_benches -- decimal_from_str Baseline: decimal_from_str ... bench: 566 ns/iter (+/- 21) [M1 Pro] decimal_from_str ... bench: 554 ns/iter (+/- 21) [Ryzen 3990x] Current: decimal_from_str ... bench: 182 ns/iter (+/- 3) [M1 Pro] decimal_from_str ... bench: 299 ns/iter (+/- 4) [Ryzen 3990x] * Use 64 bit integer with 128 bit fallback to speed up further Accumulate parse state into a 64-bit integer until we would observe an overflow of this integer, and fall back to maintaining 128-bit state cargo bench --bench lib_benches -- decimal_from_str Baseline: decimal_from_str ... bench: 566 ns/iter (+/- 21) [M1 Pro] decimal_from_str ... bench: 554 ns/iter (+/- 21) [Ryzen 3990x] Current: decimal_from_str ... bench: 133 ns/iter (+/- 4) [M1 Pro] decimal_from_str ... bench: 190 ns/iter (+/- 5) [Ryzen 3990x] * Convert into series of tail-calls This commit changes the parsing code to operate as a series of tail calls. This allows loop and overflow conditions to be checked only when exactly needed, as well as makes it very easy for the compiler to generate quality code on the fast path. The parsing functions are also parameterized by const generics to generate optimized codepaths for various situations (i.e. has/hasn't seen digits yet, has/hasn't seen the decimal point) and remove carried state. cargo bench --bench lib_benches -- decimal_from_str Baseline: decimal_from_str ... bench: 566 ns/iter (+/- 21) [M1 Pro] decimal_from_str ... bench: 554 ns/iter (+/- 21) [Ryzen 3990x] Current: decimal_from_str ... bench: 85 ns/iter (+/- 2) [M1 Pro] decimal_from_str ... bench: 129 ns/iter (+/- 2) [Ryzen 3990x] * Give each operation a dedicated dispatch This change gives each of the possible operations for the u64 accumulation a dedicated dispatch site, giving each operation dedicated branch prediction. cargo bench --bench lib_benches -- decimal_from_str Baseline: decimal_from_str ... bench: 566 ns/iter (+/- 21) [M1 Pro] decimal_from_str ... bench: 554 ns/iter (+/- 21) [Ryzen 3990x] Current: decimal_from_str ... bench: 81 ns/iter (+/- 1) [M1 Pro] decimal_from_str ... bench: 120 ns/iter (+/- 2) [Ryzen 3990x] * Move 'rare' digit handling into dedicated function Remove up-front handling of digits like '+', '-', and inline handling of '_' and put all of it into a function dedicated to not common digits. This improved the up-front cost for numbers with no prefix and has similar startup cost to the checking code for rare digits if they exist Baseline: decimal_from_str ... bench: 566 ns/iter (+/- 21) [M1 Pro] Current: decimal_from_str ... bench: 71 ns/iter (+/- 2) [M1 Pro] * Remove unimportant const generic from digit dispatch * Verify test cases in unpacked representation This makes sure that the parsing behaviour is same as before Co-authored-by: Tomatillo <[email protected]>
- Loading branch information