-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow arbitrary contents managers #24
Allow arbitrary contents managers #24
Conversation
Codecov ReportBase: 78.74% // Head: 79.39% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #24 +/- ##
==========================================
+ Coverage 78.74% 79.39% +0.64%
==========================================
Files 5 5
Lines 414 495 +81
Branches 62 68 +6
==========================================
+ Hits 326 393 +67
- Misses 66 80 +14
Partials 22 22
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
df2b200
to
d21d05f
Compare
jupyter_server_fileid/manager.py
Outdated
class ArbitraryFileIdManager(AbstractFileIdManager): | ||
def __init__(self, *args, **kwargs): | ||
pass | ||
|
||
def get_id(self, path: str) -> str: | ||
return path | ||
|
||
def get_path(self, id: str) -> str: | ||
return id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be implemented.
@davidbrochart Thanks for tackling this! In the future however, could you please assign yourself to an issue before beginning work on a PR that addresses it? I want to make sure we're avoiding duplicate effort as much as possible. |
I'd be happy to review this once it's ready and out of draft status. |
@davidbrochart Cut a PR to your branch fleshing out the |
@davidbrochart Can I go ahead and rebase to fix merge conflicts? |
Please do, thanks! |
* set default only when config doesn't specify file_id_manager_class * flesh out abstract and arbitrary file ID managers - make root_dir non-configurable - add get_handlers_by_action() method * edit docstring Co-authored-by: David Brochart <[email protected]> * update help string Co-authored-by: David Brochart <[email protected]> * update log Co-authored-by: David Brochart <[email protected]> * update log Co-authored-by: David Brochart <[email protected]> * Rename AbstractFileIdManager to BaseFileIdManager Co-authored-by: David Brochart <[email protected]>
37a6b1f
to
44e6d0e
Compare
@davidbrochart Done. I had one comment I wanted to make in this PR for visibility. The I think this is fine, given that custom ContentsManagers really should provide their own FileIdManager implementation if they want to use this extension meaningfully anyways. |
@davidbrochart Feel free to merge when ready. I'll cut a release for |
def __init__(self, *args, **kwargs): | ||
super().__init__(*args, **kwargs) | ||
|
||
def index(self, path: str) -> Union[int, str, None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can have dual forms of IDs as suggested here. Users should not be expected to change from int
to UUID
, rebuild all indices, and, worst of all, update all external references that may be squirreled away when they find out that int
as ID is insufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this was merged as I must have been editing this comment. This topic needs to be revisited but we can let #3 be that forum.
def get_id(self, path: str) -> Union[int, str, None]: | ||
raise NotImplementedError("must be implemented by subclass") | ||
|
||
def get_path(self, id: Union[int, str]) -> Union[int, str, None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_path(self, id: Union[int, str]) -> Union[int, str, None]: | |
def get_path(self, id: Union[int, str]) -> Optional[str]: |
@kevin-bates Thanks for reminding me about this issue again. Let's move this discussion in #3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry this review is late. I didn't expect the PR to be merged so quickly with the kinds of changes that seemed to be made at the last minute. These comments should be considered part of my review of #30.
@@ -37,35 +37,174 @@ def wrapped(self, *args, **kwargs): | |||
return decorator | |||
|
|||
|
|||
class FileIdManager(LoggingConfigurable): | |||
class BaseFileIdManager(LoggingConfigurable): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should derive from ABC
along with a metaclass definition in order for @abstractmethod
decorators to be effective. A good example of this can be found here. By not deriving BaseFileIdManager
from ABC
, @abstractmethod
decorators do not work (and I see those too have been removed). As a result, a subclass's violation for not implementing various methods will not be discovered until that method is called, rather than when the class instance is instantiated. Proper decoration also prevents BaseFileIdManager
from being instantiated (i.e., a true abstract base class).
def __init__(self, *args, **kwargs): | ||
super().__init__(*args, **kwargs) | ||
|
||
def index(self, path: str) -> Union[int, str, None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the best of my knowledge, this is not a public method and should not exist on the ABC.
def get_id(self, path: str) -> Union[int, str, None]: | ||
raise NotImplementedError("must be implemented by subclass") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a proper ABC definition @abstractmethod
is sufficient and this method can use pass
. It should also include the basic help-string text from which all subclasses will derive.
Same goes for get_path()
and get_handlers_by_action
below.
def move(self, old_path: str, new_path: str) -> Union[int, str, None]: | ||
raise NotImplementedError("must be implemented by subclass") | ||
|
||
def copy(self, from_path: str, to_path: str) -> Union[int, str, None]: | ||
raise NotImplementedError("must be implemented by subclass") | ||
|
||
def delete(self, path: str) -> None: | ||
raise NotImplementedError("must be implemented by subclass") | ||
|
||
def save(self, path: str) -> None: | ||
raise NotImplementedError("must be implemented by subclass") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the best of my knowledge, these are not public methods and should not exist on the ABC.
self.con.execute( | ||
"CREATE TABLE IF NOT EXISTS Files(" | ||
"id INTEGER PRIMARY KEY AUTOINCREMENT, " | ||
"path TEXT NOT NULL UNIQUE" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ArbitraryFileIdManager
is going to want to store the root_dir
value as it can change between invocations and may be necessary. We should also store any information returned from the Contents events.
"root_dir TEXT NOT NULL"
self.con.execute("CREATE INDEX IF NOT EXISTS ix_Files_path ON Files (path)") | ||
self.con.commit() | ||
|
||
def index(self, path: str) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not public, should be prefixed with _
and should probably take whatever other columns are necessary.
def move(self, old_path: str, new_path: str) -> None: | ||
row = self.con.execute("SELECT id FROM Files WHERE path = ?", (old_path,)).fetchone() | ||
id = row and row[0] | ||
|
||
if id: | ||
self.con.execute("UPDATE Files SET path = ? WHERE path = ?", (new_path, old_path)) | ||
else: | ||
cursor = self.con.execute("INSERT INTO Files (path) VALUES (?)", (new_path,)) | ||
id = cursor.lastrowid | ||
|
||
self.con.commit() | ||
return id | ||
|
||
def copy(self, from_path: str, to_path: str) -> Optional[int]: | ||
cursor = self.con.execute("INSERT INTO Files (path) VALUES (?)", (to_path,)) | ||
self.con.commit() | ||
return cursor.lastrowid | ||
|
||
def delete(self, path: str) -> None: | ||
self.con.execute("DELETE FROM Files WHERE path = ?", (path,)) | ||
self.con.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not public, should be prefixed with _
.
def save(self, path: str) -> None: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be implemented to call _index()
if its not already indexed. Otherwise, no entries will ever be inserted into the FILES table.
Also, _get()
should be implemented such that it should be called on Get
Contents events. Similar reason to why Save
events must be handled.
"get": None, | ||
"save": None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per previous comment, these should call _get()
and _save()
, respectively.
@kevin-bates Thank you for the review! Sorry we're merging things so quickly, this is being used by RTC, which is also moving very fast right now. Let me address your concerns separately:
I removed inheritance from ABC because I wasn't sure if we can use it with Traitlets. The developers I work with on Jupyter scheduler had run into an issue where they couldn't inherit from both ABC and LoggingConfigurable. There are certain traits and validation on those traits we want all file ID managers to have, specifically
Furthermore, if these methods are public, then it doesn't make sense for them to receive the full event payload as their exclusive argument.
ArbitraryFileIdManager can't make any assumptions about the filesystem. It doesn't know about absolute or relative paths, as that would assume a hierarchical filesystem. The consensus is that it's OK for the ArbitraryFileIdManager to not be fully fleshed out and handle cases like moving server roots or moving directories. Anything more requires assumptions about your filesystem, and that requires either LocalFileIdManager or some other custom implementation. |
I had run into the same thing. The key is to specify a
You're right about
This isn't for you to decide. It's perfectly reasonable for an S3 Contents Manager implementation to use Regarding the I'm not sure what the actual intention of the
This is just plain wrong. While a filesystem-based FIleIdManager can copy, move, delete files, those that are associated with non-filesystem-based ContentsManagers cannot. This is yet another reason the FileIdService should be part of CM! So when those applications switch their ContentsManagers, yet think they can manage resources using the FileIdService instead, what are these methods supposed to do!? |
@kevin-bates It's very clear you're more knowledgable on this than I am 😁. However, RTC team is under some tight time constraints at the moment, so I will have to be making a lot of changes very rapidly. How about I ping you when the
Brian relayed the discussion you all had in the Jupyter server meeting this morning. It sounds like we can safely assume that the "API paths" (paths returned from the contents manager) are always hierarchical, relative to the contents manager root, and delimited by forward slashes. Given this set of constraints, it's perfectly reasonable to join the
Destructuring of the event payload can be handled in
I think there's some misunderstanding on what these methods do. They don't actually move the file on the filesystem, they simply update the associated path in the Files table. So, Maybe a better name should have been chosen earlier to convey this better, but I chose the briefest name possible out of personal preference. Perhaps |
I'm sorry. The phrase perform in-band edits without always having to go through the contents manager implied to me that folks could use these methods to copy, move and delete files. Might you have meant to say "out-of-band edits"? Is there a case where "in-band" edits need to move a file (outside of the ContentsManager) and, in such cases, the user application must "know" to call the corresponding method on the FID manager? Just curious - thanks.
Yes - great idea! And those methods should take the complete |
Closes #19.