-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Altivec #59
Altivec #59
Conversation
Few figures: obtained from the bench tool: bench/b2bench lz4 bitshuffle suite 1 |
Hi @kif ! The PR looks good to me. It looks like there is a lot of duplicated code between Intel/SSE2 and PowerPC/ALTIVEC, but I kind of like it, because that would open the door to slowly migrate parts of the shuffle and bitshuffle to native ALTIVEC. Also, and just to make sure, can you send the complete output of the test suite on PowerPC for our inspection? Thanks! |
Indeed the code is fully duplicated from SSE2 with just a trick on the compiler side to make SSE2 look like ALTIVEC. |
Ok. Take your time for fixing the details like ftbfs (btw, what's this?) for gcc7 or elder, and the test suite. The reason why I am asking for the complete output is that we don't have CI on PowerPC. On the other hand, I suppose one can create a PowerPC virtual machine (e.g. using qemu) and run it on top of an Intel CPU. It would be grate if you can provide the possibility in this PR. |
Ok. Take your time for fixing the details like ftbfs (btw, what's this?) for gcc7 or elder, and the test suite.
In Debian jargon it means "failed to build from scratch"
The reason why I am asking for the complete output is that we don't have CI on PowerPC. On the other hand, I suppose one can create a PowerPC virtual machine (e.g. using qemu) and run it on top of an Intel CPU. It would be grate if you can provide the possibility in this PR.
We have this computer for testing, hence with a limited time.
Nevertheless there are often cloudy access to such computer for "free"
for open-source developers.
https://openpowerfoundation.org/minicloud-free-openpower-cloud/
|
I just tested on: I will provide one test and one benchmark for the logs... |
Benchmark
|
Test suite
|
Cool. Any reason you are using 1 single thread for the benchmark? Seeing scalability for different threads is always interesting. |
Also, it would be nice if you can test also with blosclz codec, which is also meant for speed. BTW, you can use the plot-speeds.py script for producing nice plots out of the output above. |
Cool. Any reason you are using 1 single thread for the benchmark? Seeing scalability for different threads is always interesting.
The only reason was the size of the output.
I also managed to get hardware accelerated gzip working withing blosc on
this platform: it makes gzip at the speed of LZ4 :)
https://github.com/abalib/power-gzip
|
Ok, looking at zlib compressing faster than memcpy is certainly quite an achievement. Thanks for pointing out to the power-gzip project. Also, in order to see whether the current shuffle/bitshuffle implementations for Altivec are a bottleneck or not, it would be instructive to see plots for decompression, as this is typically much faster than compression. I suppose that shuffle/bitshuffle still would take quite a bit of the decompression time though. Other than that, everything seems to be in good shape. It would have been nice if you provided a qemu configuration for testing PowerPC/Altivec on Intel machines (similar to what the zstd project is doing), but I understand that this can take a bit, and we can do that later anyway. |
Yeah, I was expecting something similar for bitshuffle becuase it is quite expensive. I am more curious about decompression with shuffle, which should be significantly faster. |
Hi @kif . Is that PR ready for merging or you still want to do something more? |
On Mon, 10 Jun 2019 00:37:14 -0700 Francesc Alted ***@***.***> wrote:
Hi @kif . Is that PR ready for merging or you still want to do something more?
Hi Francesc,
I believe it is good to merge:
I did not have time to investigate actual SIMD coding on that processor
and the results are not bad, so far. No regression, with new tests.
If we go on buying such hardware, I will definitly re-work the
shuffle/bitshuffle part and provide additional PR, but the decision is
not mine.
|
Merged. Thanks! |
This PR implemented the byte-shuffle and bit-shuffle for PPC processors using the portability form SSE2 vector instruction to ALTIVEC.
Close #58
This is my first PR in "C" ... help on coding convention, testing and other newbie bugs are welcome.