Skip to content

Commit

Permalink
http: Reject paths containing non-ASCII characters
Browse files Browse the repository at this point in the history
http would previously accept paths with non-ASCII characters. This
proved problematic, because multi-byte characters were encoded as
'binary', that is, the first byte was taken and the remaining bytes were
dropped for that character.

There is no sensible way to fix this without breaking backwards
compatibility for paths containing U+0080 to U+00FF characters.

We already reject paths with unescaped spaces with an exception. This
commit does the same for paths with non-ASCII characters too.

The alternative would have been to encode paths in UTF-8, but this would
cause the behaviour to silently change for paths with single-byte
non-ASCII characters (eg: the copyright character U+00A9 ©). I find it
preferable to to add to the existing prohibition of bad paths with
spaces.

Bug report: nodejs#2114
  • Loading branch information
Flimm committed Sep 25, 2015
1 parent b50e89e commit 5cea15e
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 6 deletions.
10 changes: 6 additions & 4 deletions lib/_http_client.js
Original file line number Diff line number Diff line change
Expand Up @@ -41,13 +41,15 @@ function ClientRequest(options, cb) {
if (self.agent && self.agent.protocol)
expectedProtocol = self.agent.protocol;

if (options.path && / /.test(options.path)) {
if (options.path && ! /^[\u0000-\u001f\u0021-\u007f]*$/.test(options.path)) {
// The actual regex is more like /[^A-Za-z0-9\-._~!$&'()*+,;=/:@]/
// with an additional rule for ignoring percentage-escaped characters
// but that's a) hard to capture in a regular expression that performs
// well, and b) possibly too restrictive for real-world usage. That's
// why it only scans for spaces because those are guaranteed to create
// an invalid request.
// well, and b) possibly too restrictive for real-world usage.
//
// This regex rejects paths with a space U+0020 in them, (as those are
// guaranteed to create an invalid request), as well as paths with
// non-ASCII characters, (to deprecate previous buggy behaviour).
throw new TypeError('Request path contains unescaped characters.');
} else if (protocol !== expectedProtocol) {
throw new Error('Protocol "' + protocol + '" not supported. ' +
Expand Down
11 changes: 9 additions & 2 deletions test/parallel/test-http-client-unescaped-path.js
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,13 @@ var assert = require('assert');
var http = require('http');

assert.throws(function() {
// Path with spaces in it should throw.
http.get({ path: 'bad path' }, assert.fail);
}, /contains unescaped characters/);
}, /contains unescaped characters/, 'Paths with spaces in it should throw');

for (var path of ['caf\u00e9', '\u0649\u0644\u0649', '💩' /* U+1F4A9 */ ]) {
assert.throws(
function() { http.get({ path: 'caf\u00e9' }, assert.fail); },
/contains unescaped characters/,
'Path ' + path + ' with non-ASCII characters should throw'
);
}

0 comments on commit 5cea15e

Please sign in to comment.