Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test.support.os_helper.rmdir(): PermissionError: [WinError 32] The process cannot access the file because it is being used by another process #98219

Closed
vstinner opened this issue Oct 12, 2022 · 4 comments
Labels
OS-windows tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error

Comments

@vstinner
Copy link
Member

vstinner commented Oct 12, 2022

On Windows, it's common that a test fails at exit when trying to remove a temporary directory created by a test. Usually, it's because the test spawns a child process in this directory and the test tries to remove the temporary directory before the child process completes.

For example, temp_dir() of test.support.os_helper creates a temporary directory and then deletes it with rmtree(path) of test.support.os_helper: this function is different than shutil.rmtree(), it tries again on error with a timeout of 1.0 second.

But sometimes, the function fails with the error "PermissionError: [WinError 32] The process cannot access the file because it is being used by another process" on the os.rmdir() step.

If os.rmdir() fails with PermissionError, it would be nice to try again later.

Notes:

  • I'm not sure why there is a fixed timeout of 1.0 second. I would prefer to use SHORT_TIMEOUT, or even LONG_TIMEOUT, of test.support.
  • _force_run() should use test.support.sleeping_retry()
Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\support\os_helper.py", line 483, in temp_dir
    yield path
  File "D:\a\cpython\cpython\Lib\test\support\os_helper.py", line 536, in temp_cwd
    yield cwd_dir
  File "D:\a\cpython\cpython\Lib\test\libregrtest\main.py", line 701, in main
    self._main(tests, kwargs)
  File "D:\a\cpython\cpython\Lib\test\libregrtest\main.py", line 722, in _main
    run_tests_worker(self.ns, self.worker_test_name)
  File "D:\a\cpython\cpython\Lib\test\libregrtest\runtest_mp.py", line 101, in run_tests_worker
    sys.exit(0)
SystemExit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\support\__init__.py", line 203, in _force_run
    return func(*args)
           ^^^^^^^^^^^
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:\\a\\cpython\\cpython\\build\\test_python_4372�\\test_python_worker_1096�'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\a\cpython\cpython\Lib\runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "D:\a\cpython\cpython\Lib\test\regrtest.py", line 47, in <module>
    _main()
  File "D:\a\cpython\cpython\Lib\test\regrtest.py", line 43, in _main
    main()
  File "D:\a\cpython\cpython\Lib\test\libregrtest\main.py", line 763, in main
    Regrtest().main(tests=tests, **kwargs)
  File "D:\a\cpython\cpython\Lib\test\libregrtest\main.py", line 695, in main
    with os_helper.temp_cwd(test_cwd, quiet=True):
  File "D:\a\cpython\cpython\Lib\contextlib.py", line 155, in __exit__
    self.gen.throw(value)
  File "D:\a\cpython\cpython\Lib\test\support\os_helper.py", line 534, in temp_cwd
    with temp_dir(path=name, quiet=quiet) as temp_path:
  File "D:\a\cpython\cpython\Lib\contextlib.py", line 155, in __exit__
    self.gen.throw(value)
  File "D:\a\cpython\cpython\Lib\test\support\os_helper.py", line 488, in temp_dir
    rmtree(path)
  File "D:\a\cpython\cpython\Lib\test\support\os_helper.py", line 445, in rmtree
    _rmtree(path)
  File "D:\a\cpython\cpython\Lib\test\support\os_helper.py", line 389, in _rmtree
    _waitfor(lambda p: _force_run(p, os.rmdir, p), path)
  File "D:\a\cpython\cpython\Lib\test\support\os_helper.py", line 333, in _waitfor
    func(pathname)
  File "D:\a\cpython\cpython\Lib\test\support\os_helper.py", line 389, in <lambda>
    _waitfor(lambda p: _force_run(p, os.rmdir, p), path)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\a\cpython\cpython\Lib\test\support\__init__.py", line 214, in _force_run
    return func(*args)
           ^^^^^^^^^^^
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:\\a\\cpython\\cpython\\build\\test_python_4372�

Linked PRs

@vstinner vstinner added the type-bug An unexpected behavior, bug, or error label Oct 12, 2022
@vstinner
Copy link
Member Author

See also issue #95027 and #97983. Maybe also issue #97641 which is not about test.support.

@eryksun eryksun added tests Tests in the Lib/test dir OS-windows labels Oct 12, 2022
@eryksun
Copy link
Contributor

eryksun commented Oct 12, 2022

The current wait works around an asynchronous unlink of one or more child files or directories. In this case, deleting a file or directory succeeds synchronously, but it might remain linked in the parent directory until all handles to it have been closed1.

A permission error would be because a file or directory is opened without sharing delete access, or because some file or directory in the tree is open. Ideally a worker process and its child processes should already be terminated by the time we try to delete the temporary directory. However, libregrtest.runtest_mp.TestWorkerProcess is pretty basic on Windows. It has no support for console process groups or kernel job objects. It only supports terminating the immediate worker process via TerminateProcess().

Ideally, each worker should be created in a new process group and added to a job object. Begin by calling CreateProcessW(executable, commandLine, ...) with the creation flags CREATE_NEW_PROCESS_GROUP and CREATE_SUSPENDED. Create a job via CreateJobObjectW(), and configure it to allow breakaway (but not silent breakaway) and to kill on close via QueryInformationJobObject(hJob, ...) and SetInformationJobObject(hJob, ...). Add the process to the job via AssignProcessToJobObject(hJob, hProcess). Finally, start the process via ResumeThread(hThread). To stop the worker, if the process ID is in our console session, as determined via GetConsoleProcessList(), send its process group a break event via GenerateConsoleCtrlEvent(CTRL_BREAK_EVENT, processId), and wait a little while to give the processes in the group a chance to exit on their own2. Then call TerminateJobObject(hJob, exitStatus), which forcefully terminates any remaining processes in the job.

This functionality could be implemented in libregrtest via ctypes or by wrapping the required API functions in the _winapi module. We'd maybe need one small change to the subprocess module, to make it retain the all-access thread handle that CreateProcessW() returns. This would be to support creating the process suspended, which avoids a potential race in which the child spawns another process before it gets assigned to the job3. That's not likely to matter in our case, so using CREATE_SUSPENDED and ResumeThread() could be omitted.

Footnotes

  1. On Windows 10+, the NTFS filesystem supports a POSIX-like delete that unlinks a deleted file as soon as the handle that was used to delete the file is closed. Only NTFS implements this new capability. In the API, DeleteFileW() was updated to try a POSIX delete, but RemoveDirectoryW() still only tries a classic delete.

  2. The default handler for the break event calls ExitProcess(), but a Python process can set a handler for the C SIGBREAK signal, or set a lower-level console control handler via WinAPI SetConsoleCtrlHandler().

  3. On Windows 10+, there's also a PROC_THREAD_ATTRIBUTE_JOB_LIST creation attribute that can assign a process to one or more jobs in race-free way without having to create the process suspended.

@arhadthedev
Copy link
Member

@eryksun os.kill(pid, _signal.CTRL_BREAK_EVENT) would call GenerateConsoleCtrlEvent() for us:

cpython/Modules/posixmodule.c

Lines 7985 to 7992 in 05e4886

if (sig == CTRL_C_EVENT || sig == CTRL_BREAK_EVENT) {
if (GenerateConsoleCtrlEvent(sig, (DWORD)pid) == 0) {
err = GetLastError();
PyErr_SetFromWindowsErr(err);
}
else
Py_RETURN_NONE;
}

pid, tid, _, _ = _winapi.CreateProcess(creation_flags=_winapi.CREATE_NEW_PROCESS_GROUP); _winapi.CloseHandle(tid) should do its work too.

However, I don't know job objects. Why are they necessary if we already can terminate the whole process group subtree?

@eryksun
Copy link
Contributor

eryksun commented Oct 30, 2022

os.kill(pid, _signal.CTRL_BREAK_EVENT) would call GenerateConsoleCtrlEvent()

I don't want to touch that mess, or perpetuate any further dependence on it. On Windows, os.kill() misuses GenerateConsoleCtrlEvent(), which is only for a process group ID (i.e. POSIX killpg(), or kill() with a negative pid value), and it also has bugs with exception handling that can raise SystemError. The conceptual mistakes there have in turn led to subprocess.Popen mistakenly using os.kill() on Windows to implement send_signal(), which is not actually supposed to target a process group.

What makes matters worse is that WinAPI GenerateConsoleCtrlEvent() has a serious bug if it's not used in the documented and supported way.

However, I don't know job objects. Why are they necessary if we already can terminate the whole process group subtree?

Any process in the group can choose to handle CTRL_BREAK_EVENT by ignoring it, e.g. with a SIGBREAK handler in Python that does nothing. Also, some processes in the tree might be spawned in another process group (i.e. CREATE_NEW_PROCESS_GROUP), another console session (i.e. CREATE_NEW_CONSOLE or CREATE_NO_WINDOW), or without a console session (i.e. DETACHED_PROCESS or a GUI app). In these cases we can't send the process a break event.

Sending the break event is an attempt to be polite for processes that support it. But after the timeout, we really need to ensure that all remaining processes in the job itself are forcefully terminated via TerminateJobObject(). This is closest that we can reasonably get to the behavior of POSIX killpg(pgrp, SIGTERM), followed by killpg(pgrp, SIGKILL) after a timeout.

The only processes that escape this net will be those spawned with the flags CREATE_NEW_PROCESS_GROUP (or a new console session or no console session) and CREATE_BREAKAWAY_FROM_JOB.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 15, 2022
miss-islington added a commit that referenced this issue Nov 15, 2022
(cherry picked from commit 619cadc)

Co-authored-by: Kumar Aditya <[email protected]>
CuriousLearner added a commit to CuriousLearner/cpython that referenced this issue Nov 16, 2022
* main: (8272 commits)
  Update Windows readme.txt to clarify Visual Studio required versions (pythonGH-99522)
  pythongh-99460 Emscripten trampolines on optimized METH_O and METH_NOARGS code paths (python#99461)
  pythongh-92647: [Enum] use final status to determine lookup or create (pythonGH-99500)
  pythongh-81057: Move Globals in Core Code to _PyRuntimeState (pythongh-99496)
  Post 3.12.0a2
  pythongh-99300: Use Py_NewRef() in Python/Python-ast.c (python#99499)
  pythongh-93649: Split pytime and datetime tests from _testcapimodule.c (python#99494)
  pythongh-99370: fix test_zippath_from_non_installed_posix (pythonGH-99483)
  pythonGH-99205: remove `_static` field from `PyThreadState` and `PyInterpreterState` (pythonGH-99385)
  pythongh-81057: Move the Remaining Import State Globals to _PyRuntimeState (pythongh-99488)
  pythongh-87604: Avoid publishing list of active per-interpreter audit hooks via the gc module (pythonGH-99373)
  pythongh-93649: Split getargs tests from _testcapimodule.c (python#99346)
  pythongh-81057: Move Global Variables Holding Objects to _PyRuntimeState. (pythongh-99487)
  pythonGH-98219: reduce sleep time in `asyncio` subprocess test (python#99464)
  pythonGH-99388: add `loop_factory` parameter to `asyncio.run` (python#99462)
  pythongh-99300: Use Py_NewRef() in PC/ directory (python#99479)
  pythongh-99300: Use Py_NewRef() in Doc/ directory  (python#99480)
  pythongh-99300: Use Py_NewRef() in Modules/ directory (python#99473)
  pythongh-99300: Use Py_NewRef() in Modules/ directory (python#99469)
  pythongh-99370: Calculate zip path from prefix when in a venv (pythonGH-99371)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-windows tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants