-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coordinates lost with GPM-IMERG file #342
Comments
That's weird but it's an upstream issue in kerchunk. EDIT: Possibly related to fsspec/kerchunk#177
This is almost certainly another symptom of the fact that VirtualiZarr does not yet understand how to decode according to CF conventions in the same way that xarray does. Some of the those variables are present, they are just set as data variables instead of coordinates - that's the exact same bug as in issue #189. Other coordinates ( I have a in-progress PR to actually call xarray's internal CF decoding logic from within VirtualiZarr in #224.
This rings a bell but I defer to @sharkinsspatial |
Also we should probably have some more tests that explicitly check that all the same variables and coordinates are present for a set of different example real-world files, whether opened with virtualizarr or xarray. |
I don't think that's the case here. The following variables are present in the underlying HDF5 dataset but don't appear anywhere in the virtualized dataset, either as coordinates or data variables:
The variables are just being skipped completely. |
This actually seems to be an error in kerchunk? I made a kerchunk JSON of the file and the missing variables (lon, lat, lon_bnds, lat_bnds) do not even have a corresponding from kerchunk.hdf import SingleHdf5ToZarr
import ujson
m = SingleHdf5ToZarr('3B-HHR.MS.MRG.3IMERG.19980101-S000000-E002959.0000.V07B.HDF5')
m2 = m.translate()
with open("3B-HHR.MS.MRG.3IMERG.19980101-S000000-E002959.0000.V07B.HDF5.json", "w") as f:
f.write(ujson.dumps(m2))
print(xr.open_dataset("3B-HHR.MS.MRG.3IMERG.19980101-S000000-E002959.0000.V07B.HDF5.json", engine="kerchunk", group="Grid")) <xarray.Dataset> Size: 104MB
Dimensions: (Grid/time: 1, Grid/lon: 3600,
Grid/lat: 1800, Grid/nv: 2)
Coordinates:
time (Grid/time) object 8B ...
Dimensions without coordinates: Grid/time, Grid/lon, Grid/lat, Grid/nv
Data variables:
precipitation (Grid/time, Grid/lon, Grid/lat) float32 26MB ...
precipitationQualityIndex (Grid/time, Grid/lon, Grid/lat) float32 26MB ...
probabilityLiquidPrecipitation (Grid/time, Grid/lon, Grid/lat) float32 26MB ...
randomError (Grid/time, Grid/lon, Grid/lat) float32 26MB ...
time_bnds (Grid/time, Grid/nv) object 16B ...
Attributes:
GridHeader: BinMethod=ARITHMETIC_MEAN;\nRegistration=CENTER;\nLatitudeRe... |
That definitely sounds wrong, and would throw virtualizarr's kerchunk translator off completely. But also this |
FWIW we got this file working fairly well with @sharkinsspatial's HDF-native reader: https://github.com/earth-mover/icechunk-nasa/blob/main/notebooks/write_virtual.ipynb |
I'm working with the GPM-IMERG files from NASA. Here's an example
When opening this dataset virtually, many of the coordinates are lost.
This spits out the following warning
and returns a dataset with most of the coordinates missing
I also tried with the new kerchunk-free backend and got an error
The text was updated successfully, but these errors were encountered: