Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-102153: fix CVE-2023-24329 #102470

Closed
wants to merge 9 commits into from
Closed

Conversation

xiaoge1001
Copy link

@xiaoge1001 xiaoge1001 commented Mar 6, 2023

@cpython-cla-bot
Copy link

cpython-cla-bot bot commented Mar 6, 2023

All commit authors signed the Contributor License Agreement.
CLA signed

@xiaoge1001
Copy link
Author

xiaoge1001 commented Mar 7, 2023

Before applying #102470:

>>> from urllib.parse import urlparse, urlsplit
>>> urlparse(" https://example.com")
ParseResult(scheme='', netloc='', path=' https://example.com', params='', query='', fragment='')

After applying #102470:

>>> from urllib.parse import urlparse, urlsplit
>>> urlparse(" https://example.com")
ParseResult(scheme='https', netloc='example.com', path='', params='', query='', fragment='')

@CharlieZhao95
Copy link
Contributor

BTW, this solution is mentioned at the end of the issue finder's article, see: https://pointernull.com/security/python-url-parse-problem.html for more details.

@xiaoge1001
Copy link
Author

BTW, this solution is mentioned at the end of the issue finder's article, see: https://pointernull.com/security/python-url-parse-problem.html for more details.

Yes, I read this article, too. About the leading blank problem, I can solve it using lstrip() function even without reading this article.

@xiaoge1001
Copy link
Author

xiaoge1001 commented Mar 7, 2023

#102153 (comment)

As described by @illia-v, I think it is necessary to remove the leading blanks from the URL in urllib.parse.

@arhadthedev arhadthedev added type-security A security issue stdlib Python modules in the Lib dir labels Mar 7, 2023
@illia-v
Copy link
Contributor

illia-v commented Mar 7, 2023

#102153 (comment)

As described by @illia-v, I think it is necessary to remove the leading blanks from the URL in urllib.parse.

Right, but a few characters not covered by simple .lstrip() have to be stripped too, I created an alternative – #102508.

Thanks CharlieZhao95
@xiaoge1001 xiaoge1001 requested a review from CharlieZhao95 March 8, 2023 03:52
@zhuofeng6
Copy link

i think that the PR of @xiaoge1001 is a good idea, could you review the PR, my brother @gpshead

@gpshead gpshead self-assigned this Mar 8, 2023
@xiaoge1001
Copy link
Author

Hello, is there anyone concerned about this problem?

@hugovk
Copy link
Member

hugovk commented Mar 9, 2023

Thank you for the PRs, someone will review them and let's be patient in the meantime.

See also #102153 (comment)

@gpshead
Copy link
Member

gpshead commented Mar 9, 2023

#102508 is more along the lines of what we'll want to consider. I'm closing this one in favor of that.

@gpshead gpshead closed this Mar 9, 2023
@xiaoge1001
Copy link
Author

xiaoge1001 commented Mar 9, 2023

@gpshead I'm sorry to bother you, but I don't know urllib very well. Can you answer my question?

If I use this solution (https://github.com/python/cpython/pull/102470) to fix CVE when I don't think about compatibility, can the CVE be completely fixed? Will other problems be introduced?

@gpshead
Copy link
Member

gpshead commented Mar 9, 2023

We aren't currently confident that we can safely backport a fix to older Python versions as something likely depends on some of the behavior around spaces. That said, this solution likely does work for a lot of code.

It is a reasonable recommendation (as I think the blog post suggested?) for people to call .lstrip() themselves on their url before passing it to urlparse from their own code rather than assuming they're running on top of a Python without the issue.

The other PR I closed this in favor of will have potentially even more side effects that could impact other uses of urlparse as it removes a wider/different set of characters than lstrip() considers from anywhere in the entire url string.

If you do go with this or any of the PRs in your own patched Python interpreter, please let us know how that went. Details like the relative size of your overall Python code base using your patched interpreter was and how many, if any, issues around the behavior change in the API came up after some period of time testing and deploying things with it. Those are all good data to have to inform decisions. (I'll be trying some testing of that nature myself)

@illia-v
Copy link
Contributor

illia-v commented Mar 10, 2023

The other PR I closed this in favor of will have potentially even more side effects that could impact other uses of urlparse as it removes a wider/different set of characters than lstrip() considers from anywhere in the entire url string.

@gpshead yes, it may have more side effects. But it removes the characters only at the beginning and the end of a URL string (and a scheme if one is passed separately.)

@xiaoge1001
Copy link
Author

We aren't currently confident that we can safely backport a fix to older Python versions as something likely depends on some of the behavior around spaces. That said, this solution likely does work for a lot of code.

It is a reasonable recommendation (as I think the blog post suggested?) for people to call .lstrip() themselves on their url before passing it to urlparse from their own code rather than assuming they're running on top of a Python without the issue.

The other PR I closed this in favor of will have potentially even more side effects that could impact other uses of urlparse as it removes a wider/different set of characters than lstrip() considers from anywhere in the entire url string.

If you do go with this or any of the PRs in your own patched Python interpreter, please let us know how that went. Details like the relative size of your overall Python code base using your patched interpreter was and how many, if any, issues around the behavior change in the API came up after some period of time testing and deploying things with it. Those are all good data to have to inform decisions. (I'll be trying some testing of that nature myself)

I plan to add constraints to fix CVE: "The first character of the URL-scheme string must be one ASCII alpha." Then I initiated a check on users for this constraint for two weeks, and did not receive any feedback on the impact. I think the impact of fixing the CVE problem should be small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review DO-NOT-MERGE stdlib Python modules in the Lib dir type-security A security issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants