Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking change in 2.28.0 when using string enums as Headers (working in v2.27.1) #6159

Closed
bonastreyair opened this issue Jun 9, 2022 · 18 comments · Fixed by #6356
Closed
Milestone

Comments

@bonastreyair
Copy link

When using string enums as key values for the headers dict parameter with requests.get() function in v2.28.0 it raises an InvalidHeader error but using previous version v2.27.1 works well without any error. It seems there was an unexpected breaking change with that release.

Expected Result

Dict keys for the headers parameter when using requests.get() are still valid if a string enum is used.

Actual Result

Instead it raises an InvalidHeader error stating that:

Header must be of type str or bytes, not enum.

Traceback (most recent call last):
  File "/home/yair/PycharmProjects/tests/requests_error.py", line 9, in <module>
    requests.get("http://URL", headers={CustomEnum.TRACE_ID: "90e85293-afd1-4b48-adf0-fa6daf02359e"})
  File "/home/yair/.local/lib/python3.8/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/home/yair/.local/lib/python3.8/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/yair/.local/lib/python3.8/site-packages/requests/sessions.py", line 573, in request
    prep = self.prepare_request(req)
  File "/home/yair/.local/lib/python3.8/site-packages/requests/sessions.py", line 484, in prepare_request
    p.prepare(
  File "/home/yair/.local/lib/python3.8/site-packages/requests/models.py", line 369, in prepare
    self.prepare_headers(headers)
  File "/home/yair/.local/lib/python3.8/site-packages/requests/models.py", line 491, in prepare_headers
    check_header_validity(header)
  File "/home/yair/.local/lib/python3.8/site-packages/requests/utils.py", line 1037, in check_header_validity
    raise InvalidHeader(
requests.exceptions.InvalidHeader: Header part (<CustomEnum.TRACE_ID: 'X-B3-TraceId'>) from {<CustomEnum.TRACE_ID: 'X-B3-TraceId'>: '90e85293-afd1-4b48-adf0-fa6daf02359e'} must be of type str or bytes, not <enum 'CustomEnum'>

Reproduction Steps

from enum import Enum
import requests


class CustomEnum(str, Enum):
    TRACE_ID = "X-B3-TraceId"


requests.get("http://URL", headers={CustomEnum.TRACE_ID: "90e85293-afd1-4b48-adf0-fa6daf02359e"})

System Information

$ python -m requests.help
{
  "chardet": {
    "version": "3.0.4"
  },
  "charset_normalizer": {
    "version": "2.0.4"
  },
  "cryptography": {
    "version": "2.8"
  },
  "idna": {
    "version": "3.2"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.8.10"
  },
  "platform": {
    "release": "5.13.0-48-generic",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "1010106f",
    "version": "19.0.0"
  },
  "requests": {
    "version": "2.28.0"
  },
  "system_ssl": {
    "version": "1010106f"
  },
  "urllib3": {
    "version": "1.26.6"
  },
  "using_charset_normalizer": false,
  "using_pyopenssl": true
}
@bonastreyair
Copy link
Author

I know it is easily solvable by making use of the CustomEnum.TRACE_ID.value to use the stored string instead of using the enum directly with CustomEnum.TRACE_ID... but it worked before with v2.27.1, so it feels like an unexpected breaking change... 👀

@nateprewitt
Copy link
Member

Thanks for the report, @bonastreyair! So I think this is on the edge of support. Headers have always been defined to be a str or bytes and we've been programmatically enforcing that for header values for 6 years. While this worked in 2.27.1, it wasn't an intentional function of the headers param. I'd like to see how widespread this issue is before we make a decision on cases like this.

In the meantime, Requests has no special logic for Enums, so casting it to a string prior to calling Requests will not affect behavior. That would be the recommendation while we gauge impact. Otherwise, 2.27.1 will continue to work as it did originally.

@paulclarkaranz
Copy link

Here's another instance fwiw: taverntesting/tavern#788
In this case it's just a subclass of str being passed in as header values: https://github.com/taverntesting/tavern/blob/ede36ca062b546cfaf3e6b0940fbac30733586a4/tavern/util/formatted_str.py

@FelixWohlfrom
Copy link

We also have similar issues in company internal tools.
Isn't the explicit check for str and byte contradicting the duck typing approach of python? At least previously the code was working fine, even though we passed e.g. xml elements that were then implicitly converted to strings.

@sigmavirus24
Copy link
Contributor

We also have similar issues in company internal tools.
Isn't the explicit check for str and byte contradicting the duck typing approach of python? At least previously the code was working fine, even though we passed e.g. xml elements that were then implicitly converted to strings.

No. It's not contradicting that because any time we magically convert things for users, it tends to be wrong about 50% of the time. Being explicit in our documentation hasn't been effective and these validity checks are designed to help users not send junk that will almost certainly not do what they want. Much of what has been explicitly described above either for narrow cases but not for everyone. Enums for example are not required to have a string values. How does one consistently string-ify it then? Support for people subclassing built-in types notorious for breaking those subclasses is also not a logical choice because again it will surprise people who try it by breaking in a weird way it doing the opposite of what they expect

@jaypatrickhoward
Copy link

We encountered this same issue when passing header values whose type is a subclass of str. Might it make sense to allow subclasses to pass validation? In this case the requests library would not be doing any conversion for the user.

@nateprewitt
Copy link
Member

I wrote up a patch for this on Friday. I've been waiting to see how widely this approach is used, because we've only had 4 reports with ~75% of downloads (50 million/week) now using 2.28.0. I think this falls into a very niche usage, but I agree bytes/str subclasses did work for both name and value prior to this release.

We're going to monitor for any other issues in 2.28.0 to determine if we need a patch release. I'd be willing to merge the fix above into that with a heavy caveat that this isn't strictly supported. Any breakages due to subclasses deviating in behavior from the base bytes/str type likely won't be fixed.

Long term, I think we're going to continue taking a much firmer stance on inputs because people do crazy things when the values aren't bytes/str and expect them to just work. This becomes much more problematic when the requests are being sent to systems corrupted due to incorrect "stringification". We can't reasonably handle every case, so we need to draw a line somewhere on what the APIs are intended to support to provide stability.

@jaypatrickhoward
Copy link

jaypatrickhoward commented Jun 13, 2022

In our case we're interacting with Salesforce, whose API consists of XML messages. We parse a session ID out of the response from the authentication endpoint using lxml. This value is of type lxml.etree._ElementUnicodeResult which is a subclass of str. We pass it to the constructor of a class from salesforce-bulk. This class uses requests under the covers to communicate with Salesforce. It naively assumes that the session ID we supplied can be passed to requests as a header value without issue.

Our fix was to cast the session ID to an actual str before giving it to salesforce-bulk.

@kevinr-electric
Copy link

Just want to add support for this issue by saying we are also having this issue. Our use is basically:

class ContentType(str, Enum):
    JSON = "application/json"
    XML = "text/xml"

...
def execute_request(...):
...
    headers = {"Content-Type": ContentType.JSON}
    response = requests.request(
                url=url,
                method=method,
                params=params,
                headers=headers,
                data=data
            )

@nateprewitt nateprewitt mentioned this issue Jun 29, 2022
@nickswebsite
Copy link

Using enums for header names is a common (and generally regarded as a good) practice. +1.

In the ContentType enum given above, you have what looks like a string, acts like a string, and quacks like a string. It should be treated as a string. requests shouldn't be responsible for being fed misbehaving subclasses of builtin types IMO.

Anyway, just trying to add one to the 'user engagement' since that was mentioned as a consideration in the 2.28.1 release.

@nateprewitt
Copy link
Member

Using enums for header names is a common (and generally regarded as a good) practice. +1.

I don't believe anything in this change precludes the use of enums in your code. Requests is being explicit of what we accept. enum.name becomes enum.name.value and everything works fine.

@briantist
Copy link

Also seeing this in an Ansible collection, since the test containers updated their requirements to allow a newer version of requests.

@CarliJoy
Copy link

CarliJoy commented Sep 6, 2022

As explained in #6232 -> we use a string subclass to prevent secrets leaking in certain circumstances, i.e. in Stack Traces.

Just converting them to str would contradict this intention. So in opposite to using Enums, there is not really a workaround for us.

@sigmavirus24
Copy link
Contributor

Just converting them to str would contradict this intention

To be clear, to convert it to a str before passing to requests would be a problem, but requests converting it to a string isn't a problem in your eyes? That makes absolutely zero sense to me. Requests isn't doing anything special to handle your secret.

Also, you seem to be relying on this class to avoid leaking secrets in stack traces and assuming you/your colleagues will always remember to use it rather than looking for secrets in stack traces before allowing them to be printed to logs by using a formatter to ensure no secrets are printed ever (just to be safe). Defense in depth is always better than expecting a library to do something for you at one or two layers below you.

@CarliJoy
Copy link

CarliJoy commented Sep 7, 2022

Actually the Secret class is of course one of multiple layers.
We ensure on multiple places (i.e. using our auth libs), that secrets given are wrapped in this special class.
We simply add this test to all places in our code that accepts secrets.
This helps us also to track secrets within our code base.
We not solely rely on it.

And of course we use loggers/formatters that prevent leaking secrets but enforcing that is much harder then wrapping secrets.
Better do both.

Also other libraries follow a similar approach. For instance httpx for URLs:
https://github.com/encode/httpx/blob/master/httpx/_urls.py#L512

And actually requests does not (as far I found) convert the special class back to a normal string.

    def test_header_validation(self, httpbin):
        """Ensure prepare_headers regex isn't flagging valid header contents."""
        class StringSubClass(str):
            def __repr__(self):
                return "Secret"

        valid_headers = {
            "foo": "bar baz qux",
            "bar": b"fbbq",
            "baz": "",
            "qux": "1",
            "sub": StringSubClass("VerySecret")
        }
        r = requests.get(httpbin("get"), headers=valid_headers)
        for key in valid_headers.keys():
            valid_headers[key] == r.request.headers[key]
            # Ensure type is not modified
            assert type(valid_headers[key]) == type(r.request.headers[key])

This modified test works as expected.

Even the function to_native_string called in prepare_header (and even used for keys only) in _intnal_utils.py does not convert a value back to a naive string. Of course some send logic could/will do convert the value i.e. to a byte. But to leak the secret a exception needs to be raised in function that is called with this converted value without being handled by the library...

So requests does not have to handle the secret any special.
It would only leak the secret in a stack trace, if the header is converted to a normal str class before passing it to a function that is raising the exception which is printed.
Yes of course that is not 100% protection against a possible leakage but good enough for us as very unlikely.

Consider the opposite: We tell our devs to use the Secret everywhere, but then also need to tell them to wrap calls to requests to make them visible.
It like: Hey on all terminals enter your password in the input showing **** for your input but in this special terminal just enter it in plain text.
Yes maybe also the terminal showing ***** is saving the password in the background as plain text but that is a different story. The main point here how to teach people on how to handle a secret.

To make the case more general again:

Subclasses of strings are common for a number of reason, i.e. a json/xml/yaml libary that uses it to track formatting of a string, a a string enum, mark stuff a secret etc...
Enforcing stuff to be naive string class will break this things. If it is a subclass of a string, it should be good enough.

I do not see a case in which both the isinstance check is okay and the regex matches a valid input and but the string representation is still wrong.
In this case also the output of print(value) will show the wrong value.

I know that users of libs do crazy stuff but could you elaborate what kind of wrong "stringification" could happen if both checks are valid?

@CarliJoy
Copy link

@nateprewitt any news here? I really would love to see your patch applied.

@teruokun
Copy link

Quick ping here, looking over the proposed patch I don't see a good reason not to apply it, especially given that this kind of behavior actually violates the Python principle of 'duck typing'. It's unclear why Requests needs to be deeply coupled beyond isinstance. What's the use-case where limiting to the precise coordinate of str or bytes is truly necessary? And again, we aren't talking about string conversion, but simply if it says it's an instance of str or bytes, letting users be responsible for what happens if it isn't true seems reasonable, for the same reason single-underscore private methods aren't truly private in python, but a convention.

@nateprewitt nateprewitt added this to the 2.29.0 milestone Nov 16, 2022
@rajeshkanugu
Copy link

you can resolve this issue by converting to a string before parsing it as a key to the headers directory
`
from enum import Enum
class CustomEnum(Enum):
TRACE_ID = "X-B3-TraceId"

headers = {CustomEnum.TRACE_ID: "90e85293-afd1-4b48-adf0-fa6daf02359e"}
requests.get("http://url/", headers=headers)

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.