Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP client: total/read timeout may not be respected #9547

Open
1 task done
coells opened this issue Oct 27, 2024 · 6 comments
Open
1 task done

HTTP client: total/read timeout may not be respected #9547

coells opened this issue Oct 27, 2024 · 6 comments
Labels

Comments

@coells
Copy link

coells commented Oct 27, 2024

Describe the bug

I've been using aiohttp with FastAPI server without any major problems, however, several times a week it happens that the client fails to respect request timeout.

This snippet is 100% accurate to what I have in the production code.

from time import perf_counter
from aiohttp import ClientSession, ClientTimeout


async def requets():
    timer = perf_counter()
    try:
        session = ClientSession(timeout=ClientTimeout(total=60, connect=10, sock_connect=10, sock_read=30))
        response = await session.request('POST', url='/', json={})
        response.raise_for_status()
        return await response.json()
    finally:
        print(f'took {perf_counter() - timer:.3f}s')

Most of the time, the communication works perfectly. Having client and server machines in the same rack, req/response takes milliseconds. However, there seems to be a (nightly) period of (possible?) maintenance. In that case I can see in logs that response took well over 2 hours.

took 0.853s
took 419.666s
took 5052.691s
took 7462.243s

There are dozens of requests from a single process directed to the server at the same time. The requests do succeed, however, third of them finishes under second, while the rest takes minutes or hours to complete. From the server logs and actions I know the request arrived to FastAPI server immediately, while the response takes the whole time.

The question is, why the client doesn't respect 60s total timeout that I set?

I've tested timeouts on a mock server and client and it indeed did work. Is it possible that under some circumstances the client receives data slowly and fails to respect the timeout?

Under which case is the timeout not respected? I.e. is there a way to set total timeout that hits no matter what? (I mean except for anyio cancelling groups).

To Reproduce

I wasn't able to reproduce the problem in controlled environment, however, this is a repeating problem in a period of many weeks and like caused some maintenance on the machine.

Expected behavior

To be able to set timeout that always hits no matter what.

Logs/tracebacks

please, check the explanation

Python Version

$ python --version

Python 3.12.7

aiohttp Version

$ python -m pip show aiohttp

Name: aiohttp
Version: 3.10.10
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: 
Author-email: 
License: Apache 2
Location: /apps/qminers/conda/envs/shared/lib/python3.12/site-packages
Requires: aiohappyeyeballs, aiosignal, attrs, frozenlist, multidict, yarl
Required-by:

multidict Version

$ python -m pip show multidict

Name: multidict
Version: 6.1.0
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /apps/qminers/conda/envs/shared/lib/python3.12/site-packages
Requires: 
Required-by: aiohttp, yarl

propcache Version

$ python -m pip show propcache

Name: propcache
Version: 0.2.0
Summary: Accelerated property cache
Home-page: https://github.com/aio-libs/propcache
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache-2.0
Location: /apps/qminers/conda/envs/shared/lib/python3.12/site-packages
Requires: 
Required-by: yarl

yarl Version

$ python -m pip show yarl

Name: yarl
Version: 1.14.0
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache-2.0
Location: /apps/qminers/conda/envs/shared/lib/python3.12/site-packages
Requires: idna, multidict, propcache
Required-by: aiohttp

OS

Linux, RHEL 7, x86_64

Related component

Client

Additional context

No response

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct
@coells coells added the bug label Oct 27, 2024
@Dreamsorcerer
Copy link
Member

I'm not sure if it's a deliberate decision, but I don't think the timeout applies to .read(). So, the server may be responding to the request quickly and then hanging during the rest of data transfer.

@Dreamsorcerer
Copy link
Member

Documentation does say: The maximal number of seconds for the whole operation including connection establishment, request sending and response reading.

Which would suggest that the timeout should be carried into the .read() method. I think at present it only applies to reading the start (maybe 2KB) of response which is completed in the .request() call.

@coells
Copy link
Author

coells commented Oct 28, 2024

Hey Sam!

Not only documentation, the source code of request method claims the same.

https://github.com/aio-libs/aiohttp/blob/master/aiohttp/client.py#L514

        # timeout is cumulative for all request operations
        # (request, redirects, responses, data consuming)
        tm = TimeoutHandle(
            self._loop, real_timeout.total, ceil_threshold=real_timeout.ceil_threshold
        )

And I think the timer works well ... usually.

No matter what, I'm not able to reproduce the problem in mock client/server. When I stream data from server to client, the client raises TimeoutError after total time. Yet, in production, I can clearly see this is not the case.

Would it be possible under some circumstances that the client blocks asyncio event loop?

@Dreamsorcerer
Copy link
Member

I may have missed something, but it seems like the timer is created within .request() and not passed anywhere relevant. I didn't see any timer in the .read() method, or the underlying StreamReader. Should be possible to create a test simulating this, if you want to give it a go, you could probably take a test from (e.g. from test_client_functional.py or test_web_functional.py) and create a handler which yields data with an asyncio.sleep() call in between to force a delay. Assert that the client code completes the .request() call and then assert a timeout is raised on the .read() call.

Would it be possible under some circumstances that the client blocks asyncio event loop?

Probably more likely something else in your application could block the event loop. But, seems highly unlikely you'd have something blocking for 2 hours and not realise... The json.loads() call is obviously blocking, but to take 2 hours to decode would surely require a response that would likely be bigger than the machine's RAM...

@coells
Copy link
Author

coells commented Oct 29, 2024

Thanks, Sam!

Streams have timer enabled, too. https://github.com/aio-libs/aiohttp/blob/master/aiohttp/streams.py#L345

I've spent many hours analyzing aiohttp code and couldn't find any issue. Hence I have implemented full TraceConfig a will wait for another incident.

@github-staff github-staff deleted a comment Oct 29, 2024
@Dreamsorcerer
Copy link
Member

Streams have timer enabled, too. https://github.com/aio-libs/aiohttp/blob/master/aiohttp/streams.py#L345

Right, I missed that. I think that is working correctly then. I can only imagine something is blocking your application then...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants