Reader for Hugging Face's SafeTensor format #367

TomNicholas · 2025-01-02T18:09:06Z

We should be able to write a new VirtualiZarr reader for the Hugging Face SafeTensors format.

Hugging Face safetensors is an interesting example - it's uncompressed so basically just like reading netCDF3, having no internal chunking. But it also puts all the metadata at the start of the file, making it a bit like cloud-optimized HDF5. See also huggingface/safetensors#527 (comment)

Originally posted by @TomNicholas in #218

cc the Pangeo ML people: @weiji14 @negin513 @maxrjones

TomNicholas · 2025-01-02T18:15:20Z

The format specification seems very straightforward.

If it really is that simple then I think this reader could potentially be implemented without even using the safetensors library, instead just using fsspec and parsing the bytes ourselves. Our unit tests should still probably check correctness against the safetensors library itself though.

TomNicholas · 2025-01-02T18:17:30Z

There is an interesting issue on safetensors about "multi-part uploads". Apparently it's not officially supported but nevertheless widespread. This suggests desire for the model weights to be chunked and/or version-controlled, which the virtual zarr approach could obviously help with.

TomNicholas added enhancement New feature or request references generation Reading byte ranges from archival files readers labels Jan 2, 2025

TomNicholas mentioned this issue Jan 3, 2025

Listing every format that could be represented as virtual zarr #218

Open

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reader for Hugging Face's SafeTensor format #367

Reader for Hugging Face's SafeTensor format #367

TomNicholas commented Jan 2, 2025 •

edited

Loading

TomNicholas commented Jan 2, 2025

TomNicholas commented Jan 2, 2025

Reader for Hugging Face's SafeTensor format #367

Reader for Hugging Face's SafeTensor format #367

Comments

TomNicholas commented Jan 2, 2025 • edited Loading

TomNicholas commented Jan 2, 2025

TomNicholas commented Jan 2, 2025

TomNicholas commented Jan 2, 2025 •

edited

Loading