-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Platform Latest Actions -- Python tests seg-fault on Windows #2040
Comments
I believe this is the same issue that was affecting the Wheel workflow, the issue disappear when one set Through some iterative tests, I have determined that one need to explicitly exclude these two paths from being added to the DLL directories for the workflow succeed:
Dumpbin reports the following dependencies on PyOpenColorIO:
When PyOpenColorIO is loaded, the following DLL are loaded (as seen from dlltracer), obviously this is on a build that's working, otherwise the segfault prevent from having any log from dlltracer (even with python -u for un-buffered outputs, which seem to indicate the crash is quite early in the import process):
Looking at those two folders mentioned above, I see two interesting things: I wonder if there is some kind of incompatibility going on here. I have to admit, I find the |
Regarding OCIO_PYTHON_LOAD_DLLS_FROM_PATH, that was added in this PR. Windows users of the bindings were having trouble and so Anders Langlands had proposed that we follow what OpenImageIO had done to work around the issue. So there is at least one other ASWF project doing something similar. But I agree with you Remi that it is fragile and, if it is causing other problems, I would not object to removing it and requiring Windows users to manage their path on their own. |
I would probably recommend people doing so but maybe document that it needs to be done somewhere. Thanks for digging this @remia, I was almost there on my end and those 2 paths are in the short list I had bisected. |
Maybe we could also change the default behaviour to have OCIO_PYTHON_LOAD_DLLS_FROM_PATH disabled, then, if no feedback, remove it in a future version. I also wonder if there is a clever solution where PyOpenColorIO could know some of the DLL paths needed from the compilation (in case someone doesn't have all the dependencies statically linked into OpenColorIO mostly), but this would still be fragile. The exact cause of the crash is still unclear here, maybe something like https://github.com/mxschmitt/action-tmate could help to dig further. |
Those dynamic DLLs might themselves have dependencies that also need to be added. As we build everything from scratch we can track and bring all the DLLs together where required and I would recommend that people do that or statically compile although I always had problems on Windows compiling the VFX Reference Platform statically. |
Yeah agree it can get rather complex. The |
Are you guys familiar with delvewheel? |
I think that the issue here might be that some incompatible DLLs having the same name than some OCIO depends/it dependencies depend on are added to PATH and cause it/them to break rather than missing them. I think that the Platform build here works if there is only the OCIO path for the built lib added. |
E.g., build with only the OCIO built lib added: https://github.com/colour-science/OpenColorIO/actions/runs/10977085917, this should left us wondering what is being tried to be loaded when the two offending paths are added? Maybe some incompatible Windows API SDK libs? |
Is there benefits using explicit repairwheel step over the default auditwheel / delocate_wheel steps done by cibuildwheel? As for the Windows delvewheel, it's a good suggestion, but not sure I understand correctly, is it going to pull every dependents DLL into the wheel? What about the size increase? Edit: Ah no, I think I understand, it only pulls non-system libraries. This would not fix the issue here as some systems library are added to the DLL directories and take precedence over the one present in standard locations. We wouldn't need it for our own wheel either I think as we build everything statically (except for the OpenColorIO.dll which gets added to DLL search path manually in our init.py). But could be useful for people building their own PyOpenColorIO bindings under Windows. |
I do know that on MacOS and Linux, auditwheel and delocate also patch the module rpath with the relative paths to the dependencies. Obviously, that doesn't happen with Windows, but... I'm not sure what does happen with Windows, to be honest.
In OIIO's case, somehow, either auditwheel or delocate_wheel actually managed to mangle otherwise perfectly working wheels. At a point, I'd actually provided an empty string for the custom repair-wheel command, just to prevent cibuildwheel's default repair-wheels command from breaking the wheels! In the end, what I'm really doing is using the repair-wheel-command as an opportunity to slim down + modify the wheels. It's a bit heavy-handed, but the path of least resistance right now to getting rid of everything that can be gotten rid of (like duplicate "versioned" dynamic libraries that are normally symlinked to each other). I'm blasting everything except the module and the CLI binary executables; and then I'm pointing repairwheel back to the build directory to scoop up only and exactly what the cpython modules and binaries need. |
Yeah I remember I had some issues as well with the various repair steps, can't remember what was the issue now (though I think building the OCIO library itself as shared instead of static was somehow helping)! As for the size, what I remember doing in OCIO is to strip the binaries / libraries produced and disable SO versioning in CMake to get rid of the symlinks. To come back to the main discussion, I guess there is no strong objection to disabling / removing OCIO_PYTHON_LOAD_DLLS_FROM_PATH then. |
This issue is to solve the issue that is causing the GitHub Actions Platform Latest CI runs to fail due to a seg-fault in the Python 3.11 unit tests on Windows.
The text was updated successfully, but these errors were encountered: