-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xarray does not support full range of netcdf-python compression options #7388
Comments
Ouch my bad. I think the existing test only makes sure it's passed through in encoding but not that it is actually written to disk cc @markelg this would be a nice follow-on PR :) |
I'll have a look. Sorry about that, I guess we assumed that encoding was passed with "**kwargs". I did not try it with netcdf-c 4.9.x since is it not yet available in conda-forge and I did not find the time to compile it. |
With the PR the test above works, and also bzip2. I can't get it to apply blosc filters for some reason, it works but it does not really apply them. This is the full snippet I am using:
Also, I was not able to make the conda environment in ci/environment.yml to resolve libnetcdf 4.9.1. I had to build an environment on my own. I also added the hdf5 filters
|
Hi. I updated the branch and created a fresh python environment with the idea of writing another, final test for this.
I am not sure what is going on. It seems that the currently resolved netcdf4-hdf5 versions do not like the default parameters we are supplying. My environment is
|
|
Thanks. It looks like the errors are related to this bug Unidata/netcdf-c#2674 The fix has been merged so I hope they include it in the next netcdf-c release. For the moment I prefer not to merge this as netcdf 4.9.2 and dask do not seem to play well together. |
@markelg, regarding the blosc filters, this may have been a combination of conda-forge building and an upstream netcdf-c quirk. In the conda-forge build, we did not install the necessary hdf5 plugins. I fixed that in conda-forge/libnetcdf-feedstock#172. There we also added blosc as a dependency, which was not present before. The netcdf-c quirk is that, if a compression library is not present at build time, support for it will not be compiled and the call to setting the corresponding compression will silently succeed, but not do anything. If the library is present at build time, but the corresponding plugin is missing at runtime, the attempt to set compression will error out. Regarding the HDF5-diag errors, the upstream bug has already been mentioned. I am suggesting to back-port it in conda-forge/libnetcdf-feedstock#175 for the conda-forge build, so hopefully that will work in libnetcdf-4.9.2 soon, too. |
@markelg, could you elaborate? Is this about the two issues discussed here, or are there more problems? I am asking because I am wondering whether to back-port the patches to 4.9.1? |
I think it is about these two issues only, so backporting the fixes it should work. |
Do you need them in 4.9.1 then, or is updating to 4.9.2 an option? |
Good question. Right now ci/requirements/environment.yml is resolving libnetcdf 4.9.1, so fixing 4.9.2 would not work. I am not sure why or how to change this, as few package versions are pinned. |
Now the environment resolves libnetcdf 4.9.2 and the blosc filter seems to be working. Although it does not work with some chunk shapes. I got weird errors when using a chunksize (1, 10), it works well with (5, 10).
It does not appear in google. With this I can add some tests to the PR. |
I think that is a fundamental blosc limitation, though it should possibly be handled more gracefully in libnetcdf. Probably @FrancescAlted knows better? |
I found errors too with blosc_shuffle=0 and 2, only 1 seems to work. See the test added to #7551 |
Hi. I lack some context here, but when buffer is uncompressible, Blosc (both 1 and 2) returns a negative value (https://www.blosc.org/c-blosc2/reference/blosc1.html#c.blosc1_compress). You should check for that and take actions. On the other hand, if you provide the output buffer with room enough (i.e. input_size + BLOSC(2)_MAX_OVERHEAD), Blosc guarantees compression to always succeed. |
Thank you Francesc. However I don't think we can do that low level checks here, we would need to move this issues downstream, to netCDF4-python or even to the HDF5 plugin. |
For what is worth, the https://pypi.org/project/hdf5plugin/ allow to use both Blosc and Blosc2 as a plugin for HDF5. |
Since conda-forge/libnetcdf-feedstock#172, conda-forges libnetcdf comes with blosc plugin. But maybe that is an outdated version? The source code is at https://github.com/Unidata/netcdf-c/blob/main/plugins/H5Zblosc.c. |
This is due to pydata/xarray#7388 not being solved yet.
We are eagerly waiting for this issue to be solved :) Is there anything we can do to help? |
Please check back on #7551. We've given it a push and at least it seems to work well on linux/macos. Doesn't work on windows, though. |
Almost got this fixed within one year! 😆 |
What is your issue?
Summary
The netcdf4-python API docs say the following
Although
compression
is considered a valid encoding option by Xarrayxarray/xarray/backends/netCDF4_.py
Lines 232 to 242 in bbe63ab
...it appears that we silently ignores the
compression
option when creating new netCDF4 variables:xarray/xarray/backends/netCDF4_.py
Lines 488 to 501 in bbe63ab
Code example
In addition to showing that
compression
is ignored, this also reveals several other encoding options that are not available when writing data from xarray (szip
,zstd
,bzip2
,blosc
).Proposal
We should align with the recommendation from the netcdf4 docs and support
compression=
style encoding in NetCDF. We should deprecatezlib=True
syntax.The text was updated successfully, but these errors were encountered: