Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-121650: Encode newlines in headers, and verify headers are sound #122233

Merged
merged 12 commits into from
Jul 30, 2024

Conversation

encukou
Copy link
Member

@encukou encukou commented Jul 24, 2024

Re: #121812

Hello @basbloemsaat,

I've spent the day reading through the email module, and RFCs, and I believe I found a better place to fix the issue.
This involved lots of experimentation, so I'm sending an alternative PR rather than a review on yours.

  • The generator (writer) verifies that the representation of each header is sound (a parser won't treat it as multiple headers, start-of-body, or part of another header). That should cover custom fold() implementations or Header subclasses.

    • However, some user out there is probably misusing such header injection in working code, so, I added a policy attribute to turn it back.
  • Newlines are encoded in fold(), just like undecodable bytes and other special characters.

Overall, this means that we treat newlines as valid content of headers, but “escape” them when such a header is serialized to text.

This PR is a proof of concept. It needs tests and documentation, but I'm out of time for today, and I wanted to share what I have.

Does this look reasonable to you?

encukou and others added 2 commits July 24, 2024 15:30
This should fail for custom fold() implementations that aren't careful
about newlines.
Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

---

Credit for an earlier attempt:

Co-Authored-By: Bas Bloemsaat <[email protected]>
@basbloemsaat
Copy link
Contributor

That sounds entirely reasonable, and conforms to the RFCs.

Two points:

  1. what would be the use of keeping this check in email.policy.header_store_parse ?
  2. I found one case that is not covered by this (contrived, I admit):
from email import message_from_string

email_in = """
Subject: foo <bar>\nBCC: [email protected]
To: [email protected]
From: External Sender <[email protected]>

message body
"""

msg = message_from_string(email_in)
print(msg)

@encukou
Copy link
Member Author

encukou commented Jul 27, 2024

what would be the use of keeping this check in email.policy.header_store_parse ?

This is in the branch that handles strings (rather than custom Header object). I'm not clears what kind of format that string is supposed to be in.
Keeping the check means that if someone relied on this, they'll get the same error as before. Also, the indication that something is wrong will come earlier.

I found one case that is not covered by this (contrived, I admit)

That \n there is Python syntax, by the time it gets to message_from_string it's the same as a “real” newline.
If you use a raw string r""", or read from a file, the header remains on a single line.

@encukou encukou added needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Jul 27, 2024
encukou added 3 commits July 27, 2024 16:10
I'm not touching other instances in this file, since this PR might
be backported to very old versions.
@encukou encukou marked this pull request as ready for review July 29, 2024 13:18
@encukou encukou requested a review from a team as a code owner July 29, 2024 13:18
@encukou
Copy link
Member Author

encukou commented Jul 29, 2024

@serhiy-storchaka, would you like to review this?
@warsaw, @bitdancer, @maxking, as email experts, do you have any comments?

@encukou encukou added needs backport to 3.8 needs backport to 3.9 only security fixes needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes 🔨 test-with-buildbots Test PR w/ buildbots; report in status section labels Jul 29, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @encukou for commit af41733 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 29, 2024
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Lib/email/_header_value_parser.py Outdated Show resolved Hide resolved
Lib/test/test_email/test_policy.py Outdated Show resolved Hide resolved
Co-authored-by: Serhiy Storchaka <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Aug 2, 2024

GH-122599 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 bug and security fixes label Aug 2, 2024
ambv pushed a commit to ambv/cpython that referenced this pull request Aug 2, 2024
…s are sound (pythonGH-122233)

GH-GH- Encode header parts that contain newlines

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

GH-GH- Verify that email headers are well-formed

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Aug 2, 2024

GH-122608 is a backport of this pull request to the 3.11 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.11 only security fixes label Aug 2, 2024
ambv pushed a commit to ambv/cpython that referenced this pull request Aug 2, 2024
…s are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Aug 2, 2024

GH-122609 is a backport of this pull request to the 3.10 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.10 only security fixes label Aug 2, 2024
ambv pushed a commit to ambv/cpython that referenced this pull request Aug 2, 2024
… are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Aug 2, 2024

GH-122610 is a backport of this pull request to the 3.9 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.9 only security fixes label Aug 2, 2024
ambv pushed a commit to ambv/cpython that referenced this pull request Aug 2, 2024
… are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Aug 2, 2024

GH-122611 is a backport of this pull request to the 3.8 branch.

Yhg1s pushed a commit that referenced this pull request Aug 6, 2024
…sound (GH-122233) (#122484)

gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233)

GH-GH- Encode header parts that contain newlines

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

GH-GH- Verify that email headers are well-formed

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
Yhg1s pushed a commit that referenced this pull request Aug 6, 2024
…sound (GH-122233) (#122599)

* gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233)

- Encode header parts that contain newlines

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

- Verify that email headers are well-formed

This should fail for custom fold() implementations that aren't careful
about newlines.

Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
(cherry picked from commit 0976339)

* Document changes as made in 3.12.5
hroncok pushed a commit to fedora-python/cpython that referenced this pull request Aug 6, 2024
…s are sound

pythongh-121650: Encode newlines in headers, and verify headers are sound (pythonGH-122233)

Encode header parts that contain newlines

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

Verify that email headers are well-formed

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
frenzymadness pushed a commit to frenzymadness/cpython that referenced this pull request Aug 13, 2024
…s are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
frenzymadness pushed a commit to fedora-python/cpython that referenced this pull request Aug 15, 2024
…s are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
stratakis pushed a commit to stratakis/cpython that referenced this pull request Aug 15, 2024
…s are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
hrnciar added a commit to hrnciar/cpython that referenced this pull request Aug 16, 2024
 headers are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>

This patch also contains modified commit cherry picked from
c5bba85.

This commit was backported to simplify the backport of the other commit
fixing CVE. The only modification is a removal of one test case which
tests multiple changes in Python 3.7 and it wasn't working properly
with Python 3.6 where we backported only one change.

Co-authored-by: bsiem <[email protected]>
hrnciar added a commit to fedora-python/cpython that referenced this pull request Aug 16, 2024
 headers are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

This patch also contains modified commit cherry picked from
c5bba85.

This commit was backported to simplify the backport of the other commit
fixing CVE. The only modification is a removal of one test case which
tests multiple changes in Python 3.7 and it wasn't working properly
with Python 3.6 where we backported only one change.

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
Co-authored-by: bsiem <[email protected]>
hrnciar added a commit to fedora-python/cpython that referenced this pull request Aug 20, 2024
 headers are sound (pythonGH-122233)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

This patch also contains modified commit cherry picked from
c5bba85.

This commit was backported to simplify the backport of the other commit
fixing CVE. The only modification is a removal of one test case which
tests multiple changes in Python 3.7 and it wasn't working properly
with Python 3.6 where we backported only one change.

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
Co-authored-by: bsiem <[email protected]>
blhsing pushed a commit to blhsing/cpython that referenced this pull request Aug 22, 2024
…ound (pythonGH-122233)

## Encode header parts that contain newlines

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.


## Verify that email headers are well-formed

This should fail for custom fold() implementations that aren't careful
about newlines.


Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
ambv added a commit that referenced this pull request Sep 4, 2024
…ound (GH-122233) (#122611)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
ambv added a commit that referenced this pull request Sep 4, 2024
…sound (GH-122233) (#122608)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

Verify that email headers are well-formed.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
ambv added a commit that referenced this pull request Sep 4, 2024
…sound (GH-122233) (#122609)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
ambv added a commit that referenced this pull request Sep 4, 2024
…ound (GH-122233) (#122610)

Per RFC 2047:

> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects

It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.

This should fail for custom fold() implementations that aren't careful
about newlines.

(cherry picked from commit 0976339)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Bas Bloemsaat <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
brainhoard-github pushed a commit to distro-core-curated-mirrors/poky-contrib that referenced this pull request Sep 16, 2024
Changelog: https://docs.python.org/release/3.12.5/whatsnew/changelog.html

Include security fix
CVE-2024-6923

Reference:
https://nvd.nist.gov/vuln/detail/CVE-2024-6923
python/cpython#122233

(From OE-Core rev: 777cad793a5b07d392b1d9875530fb5480e75863)

Signed-off-by: Vijay Anusuri <[email protected]>
Signed-off-by: Steve Sakoman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants