You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The all_pks method in the redis-om library lacks the option to specify the count parameter for the SCAN command. As a result, it uses a default count value, causing performance issues when dealing with a large number of keys in Redis.
#! Current Behavior@classmethodasyncdefall_pks(cls): # type: ignorekey_prefix=cls.make_key(cls._meta.primary_key_pattern.format(pk=""))
# TODO: We need to decide how we want to handle the lack of# decode_responses=True...return (
remove_prefix(key, key_prefix)
ifisinstance(key, str)
elseremove_prefix(key.decode(cls.Meta.encoding), key_prefix)
asyncforkeyincls.db().scan_iter(f"{key_prefix}*", _type="HASH")
)
Desired Behavior
I would like to propose adding an option to pass the count parameter in the all_pks method. This would allow users to customize the number of keys processed at a time, providing better control over performance when working with Redis.
#! Expected Behavior@classmethod# we can make count as an optional parameter # to make the change non-breakingasyncdefall_pks(cls, count: Optional[int]=None): # type: ignorekey_prefix=cls.make_key(cls._meta.primary_key_pattern.format(pk=""))
# TODO: We need to decide how we want to handle the lack of# decode_responses=True...return (
remove_prefix(key, key_prefix)
ifisinstance(key, str)
elseremove_prefix(key.decode(cls.Meta.encoding), key_prefix)
asyncforkeyincls.db().scan_iter(f"{key_prefix}*", _type="HASH", count=count)
)
Steps to Reproduce (if applicable)
N/A
Impact
Adding this feature would improve the performance and scalability of the library when working with Redis instances containing a large number of keys. It would provide users with better control over the data retrieval process and enable them to fine-tune the performance according to their needs.
Additional context:
I have gathered some examples that highlight the performance gains achieved by modifying the value of the count parameter. These examples serve as evidence for the potential improvements in performance when users can customize the count value in the all_pks method. (I have used pyinstrument to profile code)
Note:
It is important to note that determining the ideal count value depends on the specific Redis instance and the number of items it contains. The count value I have used in my examples is based on the number of items in my Redis instance and may not be universally applicable
Test Code
from typing import Optional
from aredis_om import HashModel, get_redis_connection
class DummyModel(HashModel):
class Meta:
global_key_prefix = "logomatcher_orchestrator"
model_key_prefix = "SimilarityScoreModel"
database = get_redis_connection(
host="redishost",
port=6379,
db=0,
)
async def print_keys_with_prefix_matching(prefix:str,count:Optional[int] = None):
async for key in DummyModel.db().scan_iter(f"{prefix}*", _type="HASH",count=count):
print(key)
Issue Description
Current Behavior
The all_pks method in the redis-om library lacks the option to specify the count parameter for the SCAN command. As a result, it uses a default count value, causing performance issues when dealing with a large number of keys in Redis.
https://github.com/redis/redis-om-python/blob/2b450b5feec5602ac823d14cbb696ec73cf02347/aredis_om/model/model.py#LL1359C1-L1368C1
Desired Behavior
I would like to propose adding an option to pass the count parameter in the all_pks method. This would allow users to customize the number of keys processed at a time, providing better control over performance when working with Redis.
Steps to Reproduce (if applicable)
N/A
Impact
Adding this feature would improve the performance and scalability of the library when working with Redis instances containing a large number of keys. It would provide users with better control over the data retrieval process and enable them to fine-tune the performance according to their needs.
Additional context:
I have gathered some examples that highlight the performance gains achieved by modifying the value of the count parameter. These examples serve as evidence for the potential improvements in performance when users can customize the count value in the all_pks method. (I have used pyinstrument to profile code)
Note:
It is important to note that determining the ideal count value depends on the specific Redis instance and the number of items it contains. The count value I have used in my examples is based on the number of items in my Redis instance and may not be universally applicable
Test Code
time took with the default count value
time took with the count value 1000
Environment
Library version: 0.0.27
Python version: 3.9.16
The text was updated successfully, but these errors were encountered: