-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix IRI "normalization" #128
Conversation
Yeah of course all those silly tests assuming |
You know, I've always hated that it does this, so huge 👍 (and sorry for taking so long to get back to this). I know @gsnedders wrote this code originally for SimplePie, so I'd be curious to know why it works this way. As far as I know, this also doesn't obey the RFC anyway. RFC 3987 states:
Let's rock this. |
\m/ |
If you can tag that release it would be neat, I want to use this for YOURLS 1.7.1 |
I believe I originally always went with the shortest form for the normalization, FWIW. I don't think there was any other logic. Per RFC 2616 (now obsolete) "/" and "" are equivalent paths. Note that RFC 7230 states:
So, yeah, normalizing to "/" is probably the right thing to do. Note also Anne's URL spec, which is probably more useful in the real world than RFC 3987 in many ways. Maybe not in ways relevant to Requests, though? |
Ping @rmccue! Merging? |
@ozh Thanks a bunch! |
Transport curl fails on http://example.com/#hash (fragment attached to domain directly)
Why it fails
When requesting
http://example.com/#hash
, the actual URL requested becomeshttp://example.com#hash
(no path set, the trailing slash has gone missing)fsockopen doesn't mind, but curl dies, just as a manual curl request on
http://example.com/#hash
works fine but fails onhttp://example.com#hash
, with errorCould not resolve host: example.com#hash; Host not found
Where it fails
URL
http://example.com/#hash
becomeshttp://example.com#hash
during the following flow of events :Request::request( $url, ... )
callsRequest::set_default($url, ...)
if ($options['idn'] !== false)
invokesnew Requests_IRI($url)
Requests_IRI::scheme_normalization()
empties the path portion, as dictates theprotected $normalization
array hereWhat the PR fixes
The normalization removes unwanted bits, eg
http://example.com:80/bleh
becomeshttp://example.com/bleh
. Alrighty. But it goes too far.The PR prevents the normalization to think
/
is an unneeded pathThe PR also makes sure that if we have a host but no path, the default
/
will be added.This would probably cause problems on non http(s) requests (not sure what the requirements are for
dict:
requests, for instance) but for Requests here that's not going to be a problem.As a result:
http://example.com
becomeshttp://example.com/
http://example.com:80#bleh
becomeshttp://example.com/#bleh
Executive summary