Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis backend #7

Open
sidharthv96 opened this issue Jun 20, 2021 · 11 comments
Open

Redis backend #7

sidharthv96 opened this issue Jun 20, 2021 · 11 comments

Comments

@sidharthv96
Copy link

sidharthv96 commented Jun 20, 2021

We've been using aioredis in our project for async support.

I could try to come up with a PR once we get a basic design going 😄

@johtso
Copy link
Owner

johtso commented Jun 21, 2021

Sounds great! Would be happy for the repo to see a bit more love.

I'm not really up to speed on what the async redis client landscape looks like, but aioredis seems like a reasonable choice.

I think at the time I was hoping there might be a client out there that gives both an async and a sync interface, so we can easily support both with the magic unasync build step. Not sure if that exists though?

Will try and take a look at the last few v0.18 tweaks soon.

@johtso
Copy link
Owner

johtso commented Jun 23, 2021

Might also be worth considering using https://github.com/aio-libs/aiocache to get multiple backends out of the box. Sync actions could then just use ayncio.run? Not sure how good an idea that is performance wise..

Another thing is that currently we're always caching full responses, even if only the cached headers need to be updated. For large payloads this isn't ideal.

@JWCook
Copy link

JWCook commented Sep 9, 2021

Hi @johtso, I'm the current maintainer of requests-cache and aiohttp-client-cache, so I'm also interested in this topic. I came across this and your comments in psf/cachecontrol#248 because I was wondering if a similar library existed yet for httpx, so I'm glad to see someone is already working on this.

So far I've continued in requests-cache's direction of making cache features specifically tailored to requests/synchronous usage and aiohttp/async usage, respectively. But I'm interested in the possibility of an http client-agnostic, async-agnostic caching library, or at least some means of making it easier to port these features for different http clients.

aiocache would be worth checking out. I haven't used it myself yet, but I would potentially be interested in refactoring some of the backends from aiohttp-client-cache that aiocache doesn't have (async SQLite, MongoDB, and DynamoDB) to extend aiocache. That might make it easier to reuse those in other libraries like this one. I don't have any plans for that yet, but just throwing that out there as a possibility.

@JWCook
Copy link

JWCook commented Sep 9, 2021

For Redis specifically:

I ran some quick tests with wrapping individual aioredis operations in asyncio.run() for synchronous usage, and unfortunately it looks to be about 4.5x slower than plain async usage of aioredis or synchronous usage of redis-py.

There could very well be a better way to "unasync" those calls, but if you end up going with separate backend classes for a sync and async version, the good news is that aioredis makes it easy, since it's intentionally consistent with the redis-py API.

@johtso
Copy link
Owner

johtso commented Mar 18, 2022

@JWCook hey, sorry for the mega slow response.. haven't had my notifications set up very well.

Super sweet libraries!

I guess those two projects of yours are a great illustration of why a single well tested sans-io implementation of a caching policy would be so awesome! All this fragmentation that could be avoided!

I'm quite pleased with the general thrust of what I came up with. The caching policy code looks like declarative logic.. but it's an IO-free generator that yields "actions" to the http-client specific implementation, waits to be fed the resulting Response, then possibly yields some more actions and returns a Response at the end. There's a couple of things about the way I wrote it that I'd probably change.. like the async_callback_generator thing seemed a good idea at the time, to wrap the whole policy in a simpler interface.. but looking at it now it seems over complicated and cryptic.

The approach I took when I put this together was to try and take the whole test suite and caching logic wholesale from Cache-Control to avoid having to reinvent the wheel (I also don't really have any strong opinions on how that kind of logic should work, my needs were quite basic). I did strip out some functionality in the name of simplifying things.

The vast majority of the tests should be rewritten as clean data driven tests of the caching policy.. no IO or mocking or anything required. And then a few integrationy tests.

I'm guessing you started from a clean slate when writing your libraries?

requests-cache looks awesome.. it would be so great to have something that polished that could tick the boxes for the various clients and async/sync..

@JWCook
Copy link

JWCook commented Mar 28, 2022

I'm guessing you started from a clean slate when writing your libraries?

Kind of. I started with requests-cache 0.5, which a different author had started and maintained up to that point. It was originally intended just for aggressive caching without cache header support, for things like web scrapers. I picked up development on that library after it had been dormant for awhile, forked it to start aiohttp-client-cache, and ended up rewriting most of the code for both libraries.

For cache header behavior, I did pretty much start with a clean slate and incrementally added features as they were needed. By this point, requests-cache has more complete header support than cache-control/httplib2 (see issues), and more thorough test coverage (about 1900 tests total).

The caching policy code looks like declarative logic.. but it's an IO-free generator that yields "actions" to the http-client specific implementation

Yeah, I think you went with the right approach there. That looks really clean! I went in a somewhat similar direction, but the separation between policy and HTTP client is still a work in progress.

@JWCook
Copy link

JWCook commented Mar 28, 2022

it would be so great to have something that polished that could tick the boxes for the various clients and async/sync..

I agree! Having cache backends that work with multiple HTTP clients would be feasible. But something that works well for both sync and async is less likely. After working for awhile on tools for both requests and aiohttp, I've been much happier with having a separate async implementation rather than trying to make something work for both.

The options are basically:

  • Start with a synchronous implementation and throw it in a separate thread for non-blocking async usage
  • Start with an async implementation and throw it in an asyncio.run() for synchronous usage

Both of which have some serious drawbacks. The libraries required for each backend are another part of it. redis-py and aioredis, for example, just do a much better job for their respective use cases than trying to use one of them for both sync and async usage.

That doesn't mean it's not worth researching, though. There could be a better way to go about it that I just haven't thought of yet.

@JWCook
Copy link

JWCook commented Mar 28, 2022

In the short term, I'm working toward a 1.0 release for requests-cache and catching up on features in aiohttp-client-cache. I'm guessing that will keep me fairly busy at least through this summer. Longer term, I'm definitely interested in collaborating on some sans-io libraries to share across different HTTP clients.

...Also I realize this isn't the original topic of this issue! Want to continue this in a different issue?

@EchoShoot
Copy link

Urgently need redis-cache! Distributed crawlers need an object that can be accessed in common.

@johtso
Copy link
Owner

johtso commented May 24, 2022

Urgently need redis-cache! Distributed crawlers need an object that can be accessed in common.

You should be able to write a redis cache pretty easily, you can see the expected interface here: https://github.com/johtso/httpx-caching/blob/master/httpx_caching/_async/_cache.py

And then pass your custom cache to the transport using the cache argument.

@kovan
Copy link

kovan commented Oct 8, 2022

I have written an adapter for using Redis as backend, seems to work well, it is in PR #17.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants