-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redisign the cahce API #240
Conversation
I'm going to make big changes to the interface between CacheControl and a cache implementation. This makes all existing cache implementations suspect. I'll have to reimplement them later.
A cache has to be able to distinguish between two different scenarios: 1. CacheControl has successfully written all the data it intended to write and wants to save it 2. The download got canceled part way through (eg due to SIGINT) and CacheControl wants to release all resources associated with it In the second scenario the cache must not return incomplete data in a subsequent open_read
4e55904
to
3d30237
Compare
Hey! Thanks for looking at this. I know this limitation has been considered before, but the discrepancies between different operating systems and potential use cases made it challenging to find a valid fix. The result was we did the simplest thing! One idea I had when looking through your commentary was that you might be able to extend the heuristic functionality to allow someone to implement a large response handler. For example, if the I haven't looked at the code for a while, but I think heuristics only are for adjusting the response object before it is sent to the actual caching logic. You'd likely have to pass some extra info to trigger the buffering or potentially swap the response's body with your custom buffer. There are probably other colors to paint the bikeshed too! I mention it as I suspect changing the core API might be challenging at this point whereas extending the heuristics might provide a more reasonable upgrade path over time. |
What kind of discrepancies?
What kind of use cases?
Exactly how would this work? Where would the body be stored while it's being downloaded? |
664bead
to
e8c2311
Compare
This should make it easier to change the cache API see psf#240
The current cache API is:
This makes fixing #145, #180 and #238 difficult because you have to store the entire response body somewhere before you can save it with
Cache.set()
.Currently CacheControl buffers the response body in memory which is just wrong - thou shalt not assume that there are no files large enough to not fit in RAM. This is the cause of #145 and #180. Additionally this makes CacheControl completely unusable with response bodies larder than
(2^32)-1 bytes
no matter how much RAM you have due to a design limitation of the msgpack format #238.I have considered a less invasive approach of buffering the response body in a temporary file, but there are at least two problems with this:
/tmp
is often atmpfs
- a filesystem backed by virtual memory. It can swap out, butswap
is not always enabled.tmpfs
has a configurable maximum size limit.swap
also often has a size limit. Basically: this would defeat the point - we would not be able to reliably cache a 100GB response even if there is enough storage for the cache.