Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr Metadata #59

Merged
merged 12 commits into from
Jul 25, 2024
16 changes: 16 additions & 0 deletions docs/source/API.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
API
===
.. toctree::
:maxdepth: 1

RasGeomHdf
RasPlanHdf
RasHdf

:code:`rashdf` provides two primary classes for reading data from
HEC-RAS geometry and plan HDF files: :code:`RasGeomHdf` and :code:`RasPlanHdf`.
Both of these classes inherit from the :code:`RasHdf` base class, which
inherits from the :code:`h5py.File` class.

Note that :code:`RasPlanHdf` inherits from :code:`RasGeomHdf`, so all of the
methods available in :code:`RasGeomHdf` are also available in :code:`RasPlanHdf`.
91 changes: 91 additions & 0 deletions docs/source/Advanced.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
Advanced
========
:code:`rashdf` provides convenience methods for generating
Zarr metadata for HEC-RAS HDF5 files. This is particularly useful
for working with stochastic ensemble simulations, where many
HEC-RAS HDF5 files are generated for different model realizations,
forcing scenarios, or other sources of uncertainty.

To illustrate this, consider a set of HEC-RAS HDF5 files stored
in an S3 bucket, where each file represents a different simulation
of a river model. We can generate Zarr metadata for each simulation
and then combine the metadata into a single Kerchunk metadata file
that includes a new "sim" dimension. This combined metadata file
can then be used to open a single Zarr dataset that includes all
simulations.

The cell timeseries output for a single simulation might look
something like this::

>>> from rashdf import RasPlanHdf
>>> plan_hdf = RasPlanHdf.open_uri("s3://bucket/simulations/1/BigRiver.p01.hdf")
>>> plan_hdf.mesh_cells_timeseries_output("BigRiverMesh1")
<xarray.Dataset> Size: 66MB
Dimensions: (time: 577, cell_id: 14188)
Coordinates:
* time (time) datetime64[ns] 5kB 1996-01-14...
* cell_id (cell_id) int64 114kB 0 1 ... 14187
Data variables:
Water Surface (time, cell_id) float32 33MB dask.array<chunksize=(3, 14188), meta=np.ndarray>
Cell Cumulative Precipitation Depth (time, cell_id) float32 33MB dask.array<chunksize=(3, 14188), meta=np.ndarray>
Attributes:
mesh_name: BigRiverMesh1

Note that the example below requires installation of the optional
libraries :code:`kerchunk`, :code:`zarr`, :code:`fsspec`, and :code:`s3fs`::

from rashdf import RasPlanHdf
from kerchunk.combine import MultiZarrToZarr
import json

# Example S3 URL pattern for HEC-RAS plan HDF5 files
s3_url_pattern = "s3://bucket/simulations/{sim}/BigRiver.p01.hdf"

zmeta_files = []
sims = list(range(1, 11))

# Generate Zarr metadata for each simulation
for sim in sims:
s3_url = s3_url_pattern.format(sim=sim)
plan_hdf = RasPlanHdf.open_uri(s3_url)
zmeta = plan_hdf.zmeta_mesh_cells_timeseries_output("BigRiverMesh1")
json_file = f"BigRiver.{sim}.p01.hdf.json"
with open(json_file, "w") as f:
json.dump(zmeta, f)
zmeta_files.append(json_file)

# Combine Zarr metadata files into a single Kerchunk metadata file
# with a new "sim" dimension
mzz = MultiZarrToZarr(zmeta_files, concat_dims=["sim"], coo_map={"sim": sims})
mzz_dict = mss.translate()

with open("BigRiver.combined.p01.json", "w") as f:
json.dump(mzz_dict, f)

Now, we can open the combined dataset with :code:`xarray`::

import xarray as xr

ds = xr.open_dataset(
"reference://",
engine="zarr",
backend_kwargs={
"consolidated": False,
"storage_options": {"fo": "BigRiver.combined.p01.json"},
},
chunks="auto",
)

The resulting combined dataset includes a new :code:`sim` dimension::

<xarray.Dataset> Size: 674MB
Dimensions: (sim: 10, time: 577, cell_id: 14606)
Coordinates:
* cell_id (cell_id) int64 117kB 0 1 ... 14605
* sim (sim) int64 80B 1 2 3 4 5 6 7 8 9 10
* time (time) datetime64[ns] 5kB 1996-01-14...
Data variables:
Cell Cumulative Precipitation Depth (sim, time, cell_id) float32 337MB dask.array<chunksize=(10, 228, 14606), meta=np.ndarray>
Water Surface (sim, time, cell_id) float32 337MB dask.array<chunksize=(10, 228, 14606), meta=np.ndarray>
Attributes:
mesh_name: BigRiverMesh1
3 changes: 2 additions & 1 deletion docs/source/RasGeomHdf.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
RasGeomHdf
==========

.. currentmodule:: rashdf
.. autoclass:: RasGeomHdf
:show-inheritance:
Expand All @@ -21,6 +22,6 @@ RasGeomHdf
get_geom_structures_attrs,
get_geom_2d_flow_area_attrs,
cross_sections_elevations,
cross_sections
cross_sections,
river_reaches

10 changes: 9 additions & 1 deletion docs/source/RasPlanHdf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ RasPlanHdf
mesh_max_ws_err,
mesh_max_iter,
mesh_last_iter,
mesh_cells_summary_output,
mesh_faces_summary_output,
mesh_cells_timeseries_output,
mesh_faces_timeseries_output,
reference_lines,
reference_lines_names,
reference_points,
Expand All @@ -31,4 +35,8 @@ RasPlanHdf
cross_sections_flow,
cross_sections_wsel,
steady_flow_names,
steady_profile_xs_output
steady_profile_xs_output,
zmeta_mesh_cells_timeseries_output,
zmeta_mesh_faces_timeseries_output,
zmeta_reference_lines_timeseries_output,
zmeta_reference_points_timeseries_output
2 changes: 2 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@
templates_path = ["_templates"]
exclude_patterns = []

master_doc = "index"


# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
Expand Down
24 changes: 6 additions & 18 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,12 @@ HDF5 files. It is a wrapper around the :code:`h5py` library, and provides an int
convenience functions for reading key HEC-RAS geometry data, output data,
and metadata.

.. toctree::
:maxdepth: 2

API
Advanced

Installation
============
With :code:`pip`::
Expand Down Expand Up @@ -82,21 +88,3 @@ credentials)::
'Simulation Start Time': datetime.datetime(1996, 1, 14, 12, 0),
'Time Window': [datetime.datetime(1996, 1, 14, 12, 0),
datetime.datetime(1996, 2, 7, 12, 0)]}


API
===
.. toctree::
:maxdepth: 1

RasGeomHdf
RasPlanHdf
RasHdf

:code:`rashdf` provides two primary classes for reading data from
HEC-RAS geometry and plan HDF files: :code:`RasGeomHdf` and :code:`RasPlanHdf`.
Both of these classes inherit from the :code:`RasHdf` base class, which
inherits from the :code:`h5py.File` class.

Note that :code:`RasPlanHdf` inherits from :code:`RasGeomHdf`, so all of the
methods available in :code:`RasGeomHdf` are also available in :code:`RasPlanHdf`.
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ classifiers = [
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
]
version = "0.5.0"
version = "0.6.0"
dependencies = ["h5py", "geopandas>=1.0,<2.0", "pyarrow", "xarray"]

[project.optional-dependencies]
dev = ["pre-commit", "ruff", "pytest", "pytest-cov", "fiona"]
dev = ["pre-commit", "ruff", "pytest", "pytest-cov", "fiona", "kerchunk", "zarr", "dask", "fsspec", "s3fs"]
docs = ["sphinx", "numpydoc", "sphinx_rtd_theme"]

[project.urls]
Expand Down
5 changes: 4 additions & 1 deletion src/rashdf/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ def __init__(self, name: str, **kwargs):
Additional keyword arguments to pass to h5py.File
"""
super().__init__(name, mode="r", **kwargs)
self._loc = name

@classmethod
def open_uri(
Expand Down Expand Up @@ -49,7 +50,9 @@ def open_uri(
import fsspec

remote_file = fsspec.open(uri, mode="rb", **fsspec_kwargs)
return cls(remote_file.open(), **h5py_kwargs)
result = cls(remote_file.open(), **h5py_kwargs)
result._loc = uri
return result

def get_attrs(self, attr_path: str) -> Dict:
"""Convert attributes from a HEC-RAS HDF file into a Python dictionary for a given attribute path.
Expand Down
Loading