Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with Subfiling VFD and multiple opens of same file #4194

Merged

Conversation

jhendersonHDF
Copy link
Collaborator

No description provided.

@jhendersonHDF jhendersonHDF added Merge - To 1.14 Priority - 1. High 🔼 These are important issues that should be resolved in the next release Component - C Library Core C library issues (usually in the src directory) Component - Parallel Parallel HDF5 (NOT thread-safety) Component - Testing Code in test or testpar directories, GitHub workflows Type - Bug / Bugfix Please report security issues to [email protected] instead of creating an issue on GitHub labels Mar 20, 2024
@jhendersonHDF
Copy link
Collaborator Author

This also fixes a few VFD test failures with subfiling related to testing multiple file opens.

* Since this cache currently just keeps all entries until
* application exit, context entry indices should just be
* consecutive
*/
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #4159, this cache invariant is no longer true in the case of multiple file opens and/or files closed out of order.

@@ -310,13 +313,21 @@ H5_free_subfiling_object(int64_t object_id)
H5_SUBFILING_GOTO_ERROR(H5E_VFL, H5E_CANTGET, FAIL,
"couldn't get subfiling context for subfiling object ID");

if (H5_free_subfiling_object_int(sf_context) < 0)
H5_SUBFILING_GOTO_ERROR(H5E_VFL, H5E_CANTFREE, FAIL, "couldn't free subfiling context object");
if (sf_context->file_ref == 0 || --sf_context->file_ref == 0) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for a ref. of 0 here since we can end up failing during context object creation before it's been associated with a file.

* Prior to calling the internal open function, we initialize
* a new subfiling context that contains topology info and
* new MPI communicators that facilitate messaging between
* HDF5 clients and the IOCs.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this comment hasn't been relevant in a long time. Updated a bit to reflect the current behavior of H5_open_subfiles

* start the service threads on the identified
* ranks as part of the subfile opening.
*/
if (open_subfile_with_context(sf_context, file_acc_flags) < 0)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function's no longer needed since this PR hoists the call to record_fid_to_subfile up here so that we increment the file ref. count on the context object when it gets reused by an extra open of a file. The only other thing the old function did was call ioc_open_files which was also hoisted up to here.

@@ -1799,6 +1839,8 @@ init_subfiling_context(subfiling_context_t *sf_context, const char *base_filenam
assert(MPI_COMM_NULL != file_comm);

sf_context->h5_file_id = file_id;
sf_context->threads_inited = false;
sf_context->file_ref = 0;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start with a ref. of 0 since the context won't have an associated file until a bit later.

*
*-------------------------------------------------------------------------
*/
static void
static herr_t
clear_fid_map_entry(uint64_t file_id, int64_t sf_context_id)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote this function a bit for clarity and added a check of the context's file reference count before clearing a mapping entry for a file to make sure we only do that when the last file hold a reference to the context is closed.

const char *failure_mssg = NULL;
static bool pass = true; /* set to false on error */
static bool disp_failure_mssgs = true; /* global force display of failure messages */
static const char *failure_mssg = NULL;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added static here since AddressSanitizer complains about these being defined in test/cache_common.h as well.

@jhendersonHDF jhendersonHDF force-pushed the subfiling_multi_file_open_share branch from 66b1a75 to 15e14be Compare March 20, 2024 21:08
/* Check if this file is already open */
H5E_BEGIN_TRY
{
status = H5_subfile_fid_to_context(file_id, &context_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that the H5E_BEGIN_TRY/H5E_END_TRY would no longer be necessary, now that only "real" errors are thrown from this routine. Can this be switched to just a normal:

if (func() < 0)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's the very first file open with the VFD, the opened file map won't be allocated yet and cause an error. Not a great design but the allocation of the map resides in the code that writes to it, which made decent sense at the time. Probably worth a slight refactoring.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's the very first file open with the VFD, the opened file map won't be allocated yet and cause an error. Not a great design but the allocation of the map resides in the code that writes to it, which made decent sense at the time. Probably worth a slight refactoring.

Yeah, I see that now. Could you fix it by extracting this block from record_fid_to_subfile() into a separate helper routine, called by that routine and H5_subfile_fid_to_context():

   if (!sf_open_file_map) { 
        if (NULL == (sf_open_file_map = malloc((size_t)DEFAULT_FILE_MAP_ENTRIES *
            H5_SUBFILING_GOTO_ERROR(H5E_RESOURCE, H5E_CANTALLOC, FAIL, "couldn't

        sf_file_map_size = DEFAULT_FILE_MAP_ENTRIES;
        for (int i = 0; i < sf_file_map_size; i++) {
            sf_open_file_map[i].file_id       = UINT64_MAX;
            sf_open_file_map[i].sf_context_id = -1;
        }
    }

Because, if it's not an error, then H5_subfile_fid_to_context() should not return an error. :-)

@jhendersonHDF jhendersonHDF force-pushed the subfiling_multi_file_open_share branch from 26fa5b9 to 374b619 Compare March 21, 2024 17:10
@derobins derobins merged commit c4cc33a into HDFGroup:develop Mar 22, 2024
51 checks passed
lrknox pushed a commit to lrknox/hdf5 that referenced this pull request Mar 25, 2024
…p#4194)

* Fix issue with Subfiling VFD and multiple opens of same file

* Update H5_subfile_fid_to_context to return error value instead of ID

* Add helper routine to initialize open file mapping
lrknox added a commit that referenced this pull request Mar 25, 2024
* Call memset before stat calls (#4202)

The buffers passed to stat-like calls are only partially filled in by
the call, leaving ununitialized memory areas when the stat buffers are
created on the stack.

This change memsets the buffers to 0 before the stat calls, quieting
the -fsanitze=memory complaints.

* Remove unused CMake configuration checks (#4199)

* Update link to Chunking in HDF5 page (#4203)

* Fix H5Pset_efile_prefix documentation error (#4206)

Fixes GH issue #1759

* Suggested header footer for NEWSLETTER (#4204)

* Suggested header footer for NEWSLETTER

* Updates

* Add NEWSLETTER.txt to h5vers script

* Fix grammar in README.md release template (#4201)

* Add back snapshot names (#4198)

* Use tar.gz extension for ABI reports (#4205)

* Fix issue with Subfiling VFD and multiple opens of same file (#4194)

* Fix issue with Subfiling VFD and multiple opens of same file

* Update H5_subfile_fid_to_context to return error value instead of ID

* Add helper routine to initialize open file mapping

* Reverts AC_SYS_LARGEFILE change (#4213)

We previously replaced local macros with AC_SYS_LARGEFILE, which is
unfortunately buggy on some systems and does not correctly set the
necessary defines, despite successfully detecting them.

This restores the previous macro hacks to acsite.m4

* Propagate group creation properties to intermediate groups (#4139)

* Add clarification for current behavior of H5Get_objtype_by_idx() (#4120)

* Addressed Fortran  issues with promoted integers and reals via compilation flags (#4209)

* addressed issue wit promoted integers and reals

* added option to use mpi_f08

* Summarize the library version table (#4212)

Fixes GH-3773

* Fix URLs (#4210)

Also removed Copyright.html context because it's no longer valid.

* Fix 'make check-vfd' target for Autotools (#4211)

Changes Autotools testing to use HDF5_TEST_DRIVER environment
variable to avoid running tests that don't work well with several
VFDs

Restores old h5_get_vfd_fapl() testing function to setup a FAPL
with a particular VFD

Adds a macro for the default VFD name

* Revert "Addressed Fortran  issues with promoted integers and reals via compil…" (#4220)

This reverts commit 06c42ff.

* Backup and clear CMAKE_C_FLAGS before performing _Float16 configure checks (#4217)

* Fix broken links (#4224)

* Fix broken URLs in documentation (#4214)

Fixes GH-3881 partially.  There are pages that need to be recreated.

* Avoid file size checks in C++ testhdf5 for certain VFDs (#4226)

* Fix an issue with type size assumptions in h5dumpgentest (#4222)

* Fix issue with -Werror cleanup sed command in configure.ac (#4223)

* Fix Java JNI warnings (#4229)

* Rework snapshots/release workflows for consistent args (#4227)

* Fixed a cache assert with too-large metadata objects (#4231)

If the library tries to load a metadata object that is above the
library's hard-coded limits, the size will trip an assert in debug
builds. In HDF5 1.14.4, this can happen if you create a very large
number of links in an old-style group that uses local heaps.

The library will now emit a normal error when it tries to load a
metadata object that is too large.

Partially addresses GitHub #3762

* Set DXPL in API context for native VOL attribute I/O calls (#4228)

* Initialize a variable in C++ testhdf5's tattr.cpp (#4232)

* Addressed Fortran issues with promoted integers and reals via compilation flags, part 2 (#4221)

* addressed issue wit promoted integers and reals

* fixed h5fcreate_f

* added option to use mpi_f08

* change the kind of logical in the parallel tests

* addressed missing return value from callback

* Use cp -rp in test_plugin.sh (#4233)

When building with debug symbols on MacOS, the cp -p commands in
test_plugin.sh will attempt to copy the .dSYM directories with
debugging info, which will fail since -r is missing.

Using cp -rp is harmless and allows the test to run

Fixes HDFFV-10542

* Clean up types in h5test.c (#4235)

Reduces warnings on 32-bit and LLP64 systems

* Fix example links (#4237)

* Fix links md files (#4239)

* Add markdown link checker action (#4219)

* Match minimum CMake version to 3.18 (#4215)
@jhendersonHDF jhendersonHDF deleted the subfiling_multi_file_open_share branch March 27, 2024 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component - C Library Core C library issues (usually in the src directory) Component - Parallel Parallel HDF5 (NOT thread-safety) Component - Testing Code in test or testpar directories, GitHub workflows Priority - 1. High 🔼 These are important issues that should be resolved in the next release Type - Bug / Bugfix Please report security issues to [email protected] instead of creating an issue on GitHub
Projects
Status: Needs Merged
Development

Successfully merging this pull request may close these issues.

4 participants