-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Dataset storage information in HDF5/JSON #75
Comments
Why does 0_1 have a url key but 0_0 does not? How about using the same storage methods (e.g. CHUNK_REF_INDIRECT, etc.) that HSDS uses? Another idea would be to have an option to store the chunk data directly in the file (hex encoded). So offset would be a byte office in the file itself. This is what ASDF does. |
I think |
Just illustrate both possible options. In a real case, only one would be used.
Does this require an additional anonymous dataset? |
H5D_CHUNKREF and H5D_CHUNKREF_INDIRECT are documented here: https://github.com/HDFGroup/hsds/blob/master/docs/design/single_object/SingleObject.md. |
|
Here's the updated example. It includes The block index schema is defined with a URI and its index separator is included as well for convenience. There are also {
"datasets": {
"7f335a2e-7ab1-11e4-87a5-3c15c2da029e": {
"attributes": [],
"dcpl": {
"fillValue": 0,
"layout": {
"class": "H5D_CHUNKED_REF",
"dims": [8]
}
},
"shape": {
"class": "H5S_SIMPLE",
"dims": [10, 10],
"maxdims": [10, 10]
},
"type": {
"base": "H5T_STD_I32BE",
"class": "H5T_INTEGER"
},
"url": "s3://mybucket/path/to/object/where/all/blocks/are",
"byteBlocks": {
"index_schema": {
"uri": "https://schema.hdfgroup.org/hdf5-json/block/index/regular",
"separator": "_"
},
"0_0": {
"offset": 1234,
"size": 2567,
"url": "s3://mybucket/path/to/block/object1"
},
"0_1": {
"offset": 56789,
"size": 1967,
"url": "s3://mybucket/path/to/block/object2"
}
}
}
}
} |
This is a proposal to add dataset storage information to HDF5/JSON. JSON key for this is named
byteBlocks
. The word "block" is hopefully still technically accurate while not too similar to "chunk".Below is an example for one dataset. Block JSON keys are in the same format as in the HSDS schema, e.g.
0_0
and0_1
. Two blocks are in the example; the0_1
block has an optionalurl
key in case of remote blocks. Theurl
key can also apply to the entire dataset, in which case cannot appear in the blocks.cc: @derobins @jreadey @gheber
The text was updated successfully, but these errors were encountered: