-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request: PEP to describe current Warehouse JSON API #367
Comments
@brainwane I haven't worked w/ that API myself and haven't written any PEPs ever. So I don't feel confident about starting this. P.S. Your mention of |
Thanks also for the suggestion, I've used it the API a little bit but am in a similar situation, and also don't think I've the time to fully commit to it. |
@webknjaz link fixed, thanks. @hugovk and @webknjaz - thanks for clearly saying no so we can move on and ask other people. :-) @dholth and @cooperlees -- are either of you interested in taking this on, maybe part of it (see the checklist in the initial post for things that need doing)? |
I want this, so I'll commit to this. I'll even help with the new API design and possible implementation. My only warning here is my english skills are bad and this will be my first PEP. I'll put aside some time Sunday night to try get the sketch outline started and possible we can chat on IRC more Monday @brainwane . |
To clarify for my benefit - is the intention here to define a standard that all indexes must implement (in the same sense that PEP 503 covers the simple API) or to define and document how Warehouse (PyPI) operates? The former would mean that we intend to allow tools to assume the existence of such an API and would mandate that all index implementations (devpi, pypiserver, Artifactory) should implement it¹ And IMO it would mean that we should be collecting input from the developers of those implementations as well as Warehouse. If it's intended simply to document the Warehouse API as "the reference implementation of a JSON API" then it's not so much of an interoperability standard and we can avoid those complexities (although conversely, it would be of more limited use for general tools like pip). ¹ Yes, it could be defined as an optional API, in which case we'd need a means of querying "do you support this?" |
@pfmoore I'd say all indexes, which is why one of the items on the checklist is
|
But also I'll defer to folks like @dstufft @techalchemy @Julian @mplanchard @fschulze on Paul's question.
@cooperlees in saying this, you are a model of an open source citizen. Thank you. :-)
Everybody's got a first time. :-) And I know other folks will help with refining the prose.
Sounds great! I think we'll also need a PEP sponsor. @jaraco @cjerdonek @zooba @gpshead @merwok are any of you open to sponsoring this? |
Cool - sorry, I missed that item. I presume that @dstufft would be BDFL-Delegate for this. |
I would say I don't think all repositories have to implement it, but rather the goal would be to standardize it so that tooling can say "I depend on a repository that implements this API", repositories are of course free to say they don't support that API, but those tools won't work with them then. We should def try to get feedback from them though, if the answer from all of them is they can't or won't implement it, then maybe we need to think harder about the path forward for it. One key thing I think I'd want to see in a PEP for this, is trying to explicitly document what use cases we're trying to make the current JSON API good for. Are we looking to just standardize it to function as a general purpose "pull data from PyPI" API, or are we looking to allow specialized tooling to use it for some purpose (as an example, do we want Bandersnatch to be able to use this for implementing mirroring? Does the current "shape" of the API allow that? If not what's our smallest change we can make to allow that? etc). |
devpi/devpi#801 is an example of something to look at to figure out why they want this API standardized and to make sure our API actually satisfies their use case. |
Thanks Donald - sorry for misremembering and thanks for the correction.
|
I'd also like the PEP to have a clear way for clients to query servers as to whether they support this API. Just trying a query and checking the response runs the risk of people exposing a "similar, but not the same" API and clients having no way of knowing. |
@havocp and @tiegz and @katzj -- if you use Warehouse's JSON API in https://github.com/librariesio/bibliothecary/ or in the Tidelift CLI tool then check out this question:
|
From the devpi side a big requirement for an API are relative links from a common root, because we support multiple indexes. Besides that I don't have much input at this point. |
I have an extreme draft up on my PEP fork here: https://github.com/cooperlees/peps/blob/warehouse_json_api/pep-9999.rst What's the best way to have everyone be able to comment + add to? Should we use a Google doc and I transfer back to the
Do you mean for the releases and urls section "url" ? Wouldn't you just put an absolute URL using your domain? Can you maybe give me an example on how you'd use a relative URL and I'll maybe understand your use case better. |
@cooperlees the current json API is at |
Ahh got it. Here I'd love to propose (in my PEP) that we keep the legacy URLs on PyPI (for legacy reasons) but in the standard move something like (and implement on PyPI - I will happily do that):
Totally open to better ideas, but something like this will allow you to get your per Index JSON API :) |
Ok - I finally sat down and described all the JSON fields I could decipher what they are for. Returned JSON fields I need help with:
Branch is here: https://github.com/cooperlees/peps/tree/warehouse_json_api What's left to do before I put up a pull request for review more PEP savvy people? How do I get a PEP number etc. I still expect this needs a lot of refinement, but I'm getting to the limits of my knowledge of the API from just using it. I think the best way forward is possibly having PyPI maintainers all take a pass at cleaning it up. I think I've done the grunt of the boring manual reading JSON files and trying to workout all fields we should make required etc. Thanks! Looking forwarding to closing this one out. |
https://github.com/cooperlees/peps/blob/warehouse_json_api/pep-9999.rst For anyone else trying to get to the PEP quickly. :P |
Brett recently answered a few questions related to this over on discuss.python.org. In terms of the process, I think you'll also want to file a PR to packaging.python.org -- adding a page to https://github.com/pypa/packaging.python.org/tree/master/source/specifications detailing the final design that folks use/implement. From https://discuss.python.org/t/how-to-propose-new-specs/4721/7?u=pradyunsg:
|
I still think we should clarify the location of the API to not make it pypi.org centric. I would propose that the base for PyPI be defined as |
That's the main intent and why I added the /json URLs on the PEP. Please feel free to suggest wording changes to make it clearer. I am a terrible writer. Just doing this cause I want the functionality, not cause I like writing. I actually dislike it a lot, so would appreciate ALL help I can get. I think tools is scope creep for this PEP. This PEP is to just make a standard designed API so we can all implement it the same. Once we have that we should request tools to support it - i.e. different base Index URLs ... like |
@kpfleming I think, based on https://discuss.python.org/t/pep-for-the-python-package-index-json-api/5717/16 , that you might want to check in here and give @cooperlees some feedback on the current draft. |
I totally missed the ping on this back in June, but happened to see a notification about it yesterday. Thanks for thinking of us! The proposed PEP seems straightforward enough to implement, and it doesn't conflict with anything pypiserver is currently providing. I have some minor questions (let me know if you'd prefer we had this conversation over on discuss.python.org -- I don't have an account there currently so figured I'd ask here):
I also have some questions that are definitively outside the scope of the PEP, like how pip will handle backwards compatibility with the old simple API and whether the intent is for pip to eventually drop support for it, the answers to which will inform the degree of urgency in updating pypisever to support the new API. Definitely glad to see this effort. It'll be great to have a clear schema that we can implement again. |
For
I think you should just start off puling the size from the file and use upload time etc. etc. to fill in as much metadata as you can and see how happy that makes your users. Othetwise, have a formal upload where all the metadata is calculated, and for your
I would expect once pypi.org supports this PEP, we would go make
|
@brainwane Thanks for the shoutout to bring me here :-) @cooperlees I'd be happy to collaborate on this PEP with you, acting as the copy-editor/reviewer to help ensure that the content is readable and understandable. I have both a desire for this PEP to be published (so that my company's tooling can benefit from it) and plenty of experience in document review and editing, so hopefully that will be a good combination. |
Well I feel it's ready (and has been for quite some time) to just get polished up and have any technical issues debated out. I'll try rebase the commit and remind myself where we all are. I feel we just need approval from @ambv and @dstufft really. @kpfleming - Happy for you to fork and PR or just go comment on the latest commit suggestions + fixes etc. I'd love to land it and go and implement the endpoints for Warehouse ASAP.
|
I assume it still needs to be published for review & discussion prior to approval (as far as I've seen it's not been posted to Discourse yet)? I'm very interested in this PEP but haven't paid much attention while it was in pre-PEP stage. |
OK, I'll put together a PR this weekend to try to get the pre-draft into a submittable state. A question though: "go and implement the endpoints for Warehouse ASAP" implies that this PEP will result in work in Warehouse, but this PEP is supposed to document the existing API. Which way is this going to go? |
It is documenting it, but the way it is today it can not support self contained mirrors for third party indexes, particularly if they serve multiple indexes. There is also lots of little endpoints and tweaks to make it more complete. @pfmoore and others have made requests in regards to this. This "JSON API" was never fully designed to be an authoritative API, thus the need for this PEP. For Example:
|
Yeah there's definitely metadata we can collect. I guess my concern was more about those fields in particular being specified as |
Hello! I work in Bloomberg's Python Infrastructure team and now was given time to be involved in this on a regular basis and hopefully help moving this forward. After speaking with @di @cooperlees @pradyunsg and reading through all of the context I've discovered that there seem to be a tendency to scope creep when discussing the PEP and I'd like to first define the scope of the proposed PEP more clearly. If we can agree on the scope below, I will make adjustments to @cooperlees draft PEP Scope
Future workThe following improvements should be eventually done, but arguably deserve their own PEPs
|
@nchepanov Thank you for helping to bring this together! At this moment in time, the only concern I have with your proposed course of action is regarding authentication and establishing a URL structure. While the current JSON endpoints are highly cacheable, if that changes in the newly proposed API endpoints we would need to require authentication (or consider how we would manage a highly cacheable unauth'd endpoint and require auth for other cases). As the stated course of this PEP would propose an API URL structure, I think we would want to at least start a draft for the "foundational" components of a new PyPI API to reference (and discuss those concerns separately). These include URL structure, versioning, and likely authentication. also pinging @di for input. |
I'll +1 @ewdurbin's comments: I think the only endpoints that would absolutely need auth are those that create/update/delete. The current JSON API does none of that, but I don't think we'd want to rule it out for a hypothetical future API. |
@ewdurbin @di thank you for taking the time to review the outline! Let me make sure I understand your concerns correctly: It is important to you that any API URL structure this PEP may introduce allows for the ability to add authentication for create/update/delete operations or operations that are not designed to be highly cacheable. If this understanding is correct, then we are all set: One of the intentions of the new API is to make no changes to the shape of the API, consequently, the new API end-points have the same exact properties as the existing JSON API. In other words, the new API is both read-only and highly cacheable.
This is absolutely correct. To be more specific, the PEP suggests:
They can be extended in many ways to accommodate discoverability capabilities, auth, pagination and more.
This is definitely important, however, it's arguably outside of the scope of this PEP. If we want to make progress, carrying this PEP all the way would be the first step. Once this is done, the next PEP can address the "foundational components" of the new PyPI API. It looks like we can take #367 (comment) as written and proceed to update the PEP draft, and start working on the PR that implements it. Cooper and I are waiting on your LGTM to move forward. |
Just to clarify, am I right that the intention here is to write a PEP, which defines the API that any index which claims to support the "Package index JSON API v1" will provide? The API will be functionally identical to the existing Warehouse API, but with more precisely defined semantics. As a Package Index Interface PEP, this would be down to @dstufft to pronounce on, I assume, and it would go through the normal PEP process, with a round of discussion in the Packaging category on Discourse before approval? You'll need a PEP sponsor to take this through the process as well. I'd advise getting the draft PEP published first, so that discussion can get under way while you're working on the PR. You should also reach out to interested parties like the devpi and Artifactory developers, who may want to add the JSON API to their index software, so they can add their feedback. Personally, I'm very much in favour of this, not least because it will provide a good foundation for migrating the XMLRPC API to JSON in the future (but I completely agree that should be out of scope for v1). So for what it's worth, you have a +1 from me. But I do think there are some rough edges in the existing API which will need sorting out - for example, the project-level |
Looks like the outline in #367 (comment) didn't cause any major opposition. Here's what I'm going to do starting next week:
I'm new to this community / process, let me know if there's anything I'm missing or can do better. |
You should find a PEP sponsor before creating the PR. The process would be roughly:
|
One thing that I just realized that I never stated publicly: I don't like having the I'd like for us to remove that key on the release-specific endpoints, as part of this change/restructuring. I know I'm asking for something that Nikita explicitly wants to keep out-of-scope ("no changes to the output"). I think removal of existing keys is less problematic for this whole effort than addition/renaming/restructuring, especially since this is information that's duplicated and unnecessarily increases response sizes. To be abundantly clear, this doesn't directly affect any of the plans/next steps here. I feel that the weirdness of this key would become clearer once it is described in a design document (i.e. PEP). |
See pypi/warehouse#9536 for a use case that's not covered by the existing JSON API (mirroring the PyPI metadata). I understand that the scope here is just to reorganise the existing API, not to add new functionality or deprecate the XMLRPC API, but I think that while we are reorganising things, we should keep known use cases in mind so that they don't get "lost in the shuffle". |
@pfmoore the need to some form of subscription API on "what changed since I last looked" is the very motivation for our involvement in this project. It appears that standardization of the JSON API as is is the first step to getting anywhere, without it the maintainers are not comfortable evolving the API in any direction. But ultimately, we (Bloomberg Engineering) want some form of "what changed" API that we can subscribe to. |
The PEP Draft is ready for comments: https://discuss.python.org/t/pep-rfc-python-package-index-warehouse-json-api-v1/9205 Once we get enough feedback, I will open a PR into |
Should we close this? It seems the consensus is to move to extending the Simple API moving forward like PEP 692 through PEP and requests for comments on said PEPs. This has gone round and round and a lot of people said we should freeze the "JSON API" and move elsewhere. Thoughts? |
Desired: a PEP to describe the current Warehouse JSON API, to:
Does anyone want to volunteer to do this? It might take 15-20 hours total and would help a lot of folks out.
Stuff to cover
Includes:
Also, the current JSON API has some flaws, and so as we document it, that's an opportunity to find out what people expect, how the designers expected it to be used, what people need and want, etc.. But do not think your job is to fix those things. Your job is to log those things and document the existing state.
Background
Today in IRC @dstufft, @techalchemy, and I discussed:
@dstufft said that interoperability depends on standardizing the Warehouse API:
But it may be several months if not more before volunteers can design and implement a next-generation Warehouse JSON API. So how can we help consumers and peers of the current Warehouse API work with what currently exists? We agreed that it would be a good interim step to document the current API in a PEP. We estimate "documenting what exists today is probably a less than 10 hours of total work" for the initial draft.
I would like to see this written and accepted within a few months. But as @techalchemy notes: "Even if it's not accepted, as long as it makes it to draft status it can generally start being useful. A draft PEP means pip can move toward supporting something which will force adoption ...will force other index server implementations to start considering building support."
Checklist
First, someone needs to volunteer to be the lead author on this PEP.
Warehouse developers fix Fix how Warehouse stores metadata (per-file, not per-release) pypi/warehouse#8090.
Author talks in this thread with @brainwane about a reasonable deadline and a schedule for interim checkins to write the first draft, and gets the thread on Discourse started.
Author makes first PEP draft as a sketched-in outline
Author submits as a Work-In-Progress PR to the python/peps repo, and circulates on distutils-sig/discuss.python.org for comment, and to maintainers of Warehouse clients and other indexes, and revises in response to their comments
Author finalizes PEP and gets PEP accepted by BDFL-Delegate
We all celebrate! Now we have a standard that all the clients can feel guaranteed about using and that other indexes can implement!
After this
After this, we can get some dedicated volunteer time committed, from a Warehouse expert, to write the next-gen JSON API PEP (this will be a substantial task, and I'm pretty wary of getting a grant for it because it's way more research than development), and then get that discussed and accepted, and then apply for and get money to implement it.
The text was updated successfully, but these errors were encountered: