-
-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue with Subfiling VFD and multiple opens of same file #4194
Fix issue with Subfiling VFD and multiple opens of same file #4194
Conversation
This also fixes a few VFD test failures with subfiling related to testing multiple file opens. |
* Since this cache currently just keeps all entries until | ||
* application exit, context entry indices should just be | ||
* consecutive | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After #4159, this cache invariant is no longer true in the case of multiple file opens and/or files closed out of order.
@@ -310,13 +313,21 @@ H5_free_subfiling_object(int64_t object_id) | |||
H5_SUBFILING_GOTO_ERROR(H5E_VFL, H5E_CANTGET, FAIL, | |||
"couldn't get subfiling context for subfiling object ID"); | |||
|
|||
if (H5_free_subfiling_object_int(sf_context) < 0) | |||
H5_SUBFILING_GOTO_ERROR(H5E_VFL, H5E_CANTFREE, FAIL, "couldn't free subfiling context object"); | |||
if (sf_context->file_ref == 0 || --sf_context->file_ref == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check for a ref. of 0 here since we can end up failing during context object creation before it's been associated with a file.
* Prior to calling the internal open function, we initialize | ||
* a new subfiling context that contains topology info and | ||
* new MPI communicators that facilitate messaging between | ||
* HDF5 clients and the IOCs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this comment hasn't been relevant in a long time. Updated a bit to reflect the current behavior of H5_open_subfiles
* start the service threads on the identified | ||
* ranks as part of the subfile opening. | ||
*/ | ||
if (open_subfile_with_context(sf_context, file_acc_flags) < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function's no longer needed since this PR hoists the call to record_fid_to_subfile
up here so that we increment the file ref. count on the context object when it gets reused by an extra open of a file. The only other thing the old function did was call ioc_open_files
which was also hoisted up to here.
@@ -1799,6 +1839,8 @@ init_subfiling_context(subfiling_context_t *sf_context, const char *base_filenam | |||
assert(MPI_COMM_NULL != file_comm); | |||
|
|||
sf_context->h5_file_id = file_id; | |||
sf_context->threads_inited = false; | |||
sf_context->file_ref = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Start with a ref. of 0 since the context won't have an associated file until a bit later.
* | ||
*------------------------------------------------------------------------- | ||
*/ | ||
static void | ||
static herr_t | ||
clear_fid_map_entry(uint64_t file_id, int64_t sf_context_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrote this function a bit for clarity and added a check of the context's file reference count before clearing a mapping entry for a file to make sure we only do that when the last file hold a reference to the context is closed.
const char *failure_mssg = NULL; | ||
static bool pass = true; /* set to false on error */ | ||
static bool disp_failure_mssgs = true; /* global force display of failure messages */ | ||
static const char *failure_mssg = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added static here since AddressSanitizer complains about these being defined in test/cache_common.h
as well.
66b1a75
to
15e14be
Compare
/* Check if this file is already open */ | ||
H5E_BEGIN_TRY | ||
{ | ||
status = H5_subfile_fid_to_context(file_id, &context_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that the H5E_BEGIN_TRY/H5E_END_TRY would no longer be necessary, now that only "real" errors are thrown from this routine. Can this be switched to just a normal:
if (func() < 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's the very first file open with the VFD, the opened file map won't be allocated yet and cause an error. Not a great design but the allocation of the map resides in the code that writes to it, which made decent sense at the time. Probably worth a slight refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's the very first file open with the VFD, the opened file map won't be allocated yet and cause an error. Not a great design but the allocation of the map resides in the code that writes to it, which made decent sense at the time. Probably worth a slight refactoring.
Yeah, I see that now. Could you fix it by extracting this block from record_fid_to_subfile() into a separate helper routine, called by that routine and H5_subfile_fid_to_context():
if (!sf_open_file_map) {
if (NULL == (sf_open_file_map = malloc((size_t)DEFAULT_FILE_MAP_ENTRIES *
H5_SUBFILING_GOTO_ERROR(H5E_RESOURCE, H5E_CANTALLOC, FAIL, "couldn't
sf_file_map_size = DEFAULT_FILE_MAP_ENTRIES;
for (int i = 0; i < sf_file_map_size; i++) {
sf_open_file_map[i].file_id = UINT64_MAX;
sf_open_file_map[i].sf_context_id = -1;
}
}
Because, if it's not an error, then H5_subfile_fid_to_context() should not return an error. :-)
26fa5b9
to
374b619
Compare
…p#4194) * Fix issue with Subfiling VFD and multiple opens of same file * Update H5_subfile_fid_to_context to return error value instead of ID * Add helper routine to initialize open file mapping
* Call memset before stat calls (#4202) The buffers passed to stat-like calls are only partially filled in by the call, leaving ununitialized memory areas when the stat buffers are created on the stack. This change memsets the buffers to 0 before the stat calls, quieting the -fsanitze=memory complaints. * Remove unused CMake configuration checks (#4199) * Update link to Chunking in HDF5 page (#4203) * Fix H5Pset_efile_prefix documentation error (#4206) Fixes GH issue #1759 * Suggested header footer for NEWSLETTER (#4204) * Suggested header footer for NEWSLETTER * Updates * Add NEWSLETTER.txt to h5vers script * Fix grammar in README.md release template (#4201) * Add back snapshot names (#4198) * Use tar.gz extension for ABI reports (#4205) * Fix issue with Subfiling VFD and multiple opens of same file (#4194) * Fix issue with Subfiling VFD and multiple opens of same file * Update H5_subfile_fid_to_context to return error value instead of ID * Add helper routine to initialize open file mapping * Reverts AC_SYS_LARGEFILE change (#4213) We previously replaced local macros with AC_SYS_LARGEFILE, which is unfortunately buggy on some systems and does not correctly set the necessary defines, despite successfully detecting them. This restores the previous macro hacks to acsite.m4 * Propagate group creation properties to intermediate groups (#4139) * Add clarification for current behavior of H5Get_objtype_by_idx() (#4120) * Addressed Fortran issues with promoted integers and reals via compilation flags (#4209) * addressed issue wit promoted integers and reals * added option to use mpi_f08 * Summarize the library version table (#4212) Fixes GH-3773 * Fix URLs (#4210) Also removed Copyright.html context because it's no longer valid. * Fix 'make check-vfd' target for Autotools (#4211) Changes Autotools testing to use HDF5_TEST_DRIVER environment variable to avoid running tests that don't work well with several VFDs Restores old h5_get_vfd_fapl() testing function to setup a FAPL with a particular VFD Adds a macro for the default VFD name * Revert "Addressed Fortran issues with promoted integers and reals via compil…" (#4220) This reverts commit 06c42ff. * Backup and clear CMAKE_C_FLAGS before performing _Float16 configure checks (#4217) * Fix broken links (#4224) * Fix broken URLs in documentation (#4214) Fixes GH-3881 partially. There are pages that need to be recreated. * Avoid file size checks in C++ testhdf5 for certain VFDs (#4226) * Fix an issue with type size assumptions in h5dumpgentest (#4222) * Fix issue with -Werror cleanup sed command in configure.ac (#4223) * Fix Java JNI warnings (#4229) * Rework snapshots/release workflows for consistent args (#4227) * Fixed a cache assert with too-large metadata objects (#4231) If the library tries to load a metadata object that is above the library's hard-coded limits, the size will trip an assert in debug builds. In HDF5 1.14.4, this can happen if you create a very large number of links in an old-style group that uses local heaps. The library will now emit a normal error when it tries to load a metadata object that is too large. Partially addresses GitHub #3762 * Set DXPL in API context for native VOL attribute I/O calls (#4228) * Initialize a variable in C++ testhdf5's tattr.cpp (#4232) * Addressed Fortran issues with promoted integers and reals via compilation flags, part 2 (#4221) * addressed issue wit promoted integers and reals * fixed h5fcreate_f * added option to use mpi_f08 * change the kind of logical in the parallel tests * addressed missing return value from callback * Use cp -rp in test_plugin.sh (#4233) When building with debug symbols on MacOS, the cp -p commands in test_plugin.sh will attempt to copy the .dSYM directories with debugging info, which will fail since -r is missing. Using cp -rp is harmless and allows the test to run Fixes HDFFV-10542 * Clean up types in h5test.c (#4235) Reduces warnings on 32-bit and LLP64 systems * Fix example links (#4237) * Fix links md files (#4239) * Add markdown link checker action (#4219) * Match minimum CMake version to 3.18 (#4215)
No description provided.