Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong time encoding after padding #4224

Closed
xzenggit opened this issue Jul 15, 2020 · 4 comments
Closed

wrong time encoding after padding #4224

xzenggit opened this issue Jul 15, 2020 · 4 comments
Labels
topic-metadata Relating to the handling of metadata (i.e. attrs and encoding)

Comments

@xzenggit
Copy link

What happened:

If I open a netcdf with default settings (contain a daily time dimension) and then pad with hourly values, even the padded dataset shows hourly time values, the hourly values cannot be saved. I think this is due to the encoding, but I'm not sure how to fix it.

What you expected to happen:
I expected the final line of code give me

#array(['2000-01-01T00:00:00.000000000', '2000-01-01T01:00:00.000000000',
#      '2000-01-01T02:00:00.000000000', '2000-01-01T03:00:00.000000000',
#       '2000-01-01T04:00:00.000000000'], dtype='datetime64[ns]')

Instead, it outputs

#array(['2000-01-01T00:00:00.000000000', '2000-01-01T00:00:00.000000000',
#      '2000-01-01T00:00:00.000000000', '2000-01-01T00:00:00.000000000',
#      '2000-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

Minimal Complete Verifiable Example:

import xarray as xr

time = pd.date_range("2000-01-01", freq="1D", periods=365 )
ds = xr.Dataset({"foo": ("time", np.arange(365)), "time": time})
ds.to_netcdf('test5.nc')

ds = xr.open_dataset('test5.nc')
ds.time.encoding

# padding
ds_hourly = ds.resample(time='1h').pad()
ds_hourly.time.values[0:5]
#array(['2000-01-01T00:00:00.000000000', '2000-01-01T01:00:00.000000000',
#      '2000-01-01T02:00:00.000000000', '2000-01-01T03:00:00.000000000',
#       '2000-01-01T04:00:00.000000000'], dtype='datetime64[ns]')

ds_hourly.to_netcdf('test6.nc')

# load the padded data file
ds_hourly_load = xr.open_dataset('test6.nc')
ds_hourly_load.time.values[0:5]
#array(['2000-01-01T00:00:00.000000000', '2000-01-01T00:00:00.000000000',
#      '2000-01-01T00:00:00.000000000', '2000-01-01T00:00:00.000000000',
#      '2000-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

Anything else we need to know?:

Environment:
xarray version: '0.15.1'

Output of xr.show_versions()
@dcherian
Copy link
Contributor

Hmm.. there is a similar issue with zarr on this repo. You probably need to del ds_hourly.time.encoding["units"]. We should issue a warning when the values cannot be roundtripped with the specified encoding.

@xzenggit
Copy link
Author

xzenggit commented Jul 15, 2020

Thanks a lot, @dcherian . del ds_hourly.time.encoding["units"] works!

We can also set ds_hourly.time.encoding = {} before saving the dataset to override previous encoding. Definitely agree with you. There should be a warning here. Thanks!

Hmm.. there is a similar issue with zarr on this repo. You probably need to del ds_hourly.time.encoding["units"]. We should issue a warning when the values cannot be roundtripped with the specified encoding.

@keewis keewis added the topic-metadata Relating to the handling of metadata (i.e. attrs and encoding) label Sep 6, 2020
@stale
Copy link

stale bot commented Apr 29, 2022

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Apr 29, 2022
@dcherian dcherian removed the stale label Apr 29, 2022
@kmuehlbauer
Copy link
Contributor

This is fixed in current xarray and a meaningful warning is issued:

UserWarning: Times can't be serialized faithfully to int64 with requested units 'days since 
2000-01-01'. Serializing with units 'hours since 2000-01-01' instead. Set encoding['dtype'] to 
floating point dtype to serialize with units 'days since 2000-01-01'. Set encoding['units'] to 
'hours since 2000-01-01' to silence this warning .
array(['2000-01-01T00:00:00.000000000', '2000-01-01T01:00:00.000000000',
       '2000-01-01T02:00:00.000000000', '2000-01-01T03:00:00.000000000',
       '2000-01-01T04:00:00.000000000'], dtype='datetime64[ns]')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-metadata Relating to the handling of metadata (i.e. attrs and encoding)
Projects
None yet
Development

No branches or pull requests

4 participants