Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ndb.PickleProperty UnicodeDecodeError: 'ascii' codec can't decode byte with NDB_USE_CROSS_COMPATIBLE_PICKLE_PROTOCOL #111

Open
kaansoral opened this issue Jan 28, 2024 · 5 comments · May be fixed by #112

Comments

@kaansoral
Copy link

Expected Behavior

No decode issues

Actual Behavior

Decode issues

Steps to Reproduce the Problem

  1. set NDB_USE_CROSS_COMPATIBLE_PICKLE_PROTOCOL:True in app.yaml
  2. Have a ndb.PickleProperty in python27
  3. Access the ndb.PickleProperty in python312
  4. File "/layers/google.python.pip/pip/lib/python3.12/site-packages/google/appengine/ext/ndb/model.py", line 1913, in _from_base_type UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 1: ordinal not in range(128)

Specifications

  • Version: latest
  • Platform: python312
@pnico
Copy link

pnico commented Jan 29, 2024

Cross-compatible pickle protocol is for a different problem from the one you are talking about. You probably need to wrap the property access in a getter function which will catch the error, then try to dig out the underlying stored value, and unpickle it 'manually' using a different protocol like 'latin1' - it depends on what you're storing in the property, which you didn't specify.

@kaansoral
Copy link
Author

"NDB_USE_CROSS_COMPATIBLE_PICKLE_PROTOCOL" seems pretty self explanatory to me - it should work out of the box. I have no idea what problem it's intended to solve other than the name as it's not documented anywhere.

Still it's a good suggestion, better than waiting for a star to fall, but I could not find an easy way to access the base value at a quick glance without triggering the unpickle - maybe turning ndb.PickleProperty()'s to ndb.BlobProperty()'s and then unpickleing them with 'latin1' and re-pickleing them with Python3 default pickle is an option - yet it's quite a challenging undertaking as I'd need to manually block all access to my app and risk corrupting data

@kaansoral
Copy link
Author

My Python is rusty but maybe dynamically overriding the section here in code:
https://chromium.googlesource.com/external/googleappengine/python/+/3b4e135a03ddf2132df0c0c8d3f7cdc9eaa8c34b/google/appengine/ext/ndb/model.py

class PickleProperty(BlobProperty):
--
class PickleProperty(BlobProperty):
  """A Property whose value is any picklable Python object."""
  def _to_base_type(self, value):
    return pickle.dumps(value, pickle.HIGHEST_PROTOCOL)
  def _from_base_type(self, value):
    return pickle.loads(value)

with

class PickleProperty(BlobProperty):
--
class PickleProperty(BlobProperty):
  """A Property whose value is any picklable Python object."""
  def _to_base_type(self, value):
    return pickle.dumps(value, pickle.HIGHEST_PROTOCOL)
  def _from_base_type(self, value):
    try: return pickle.loads(value)
    except: return pickle.loads(value,encoding='latin1')

is an easy fix

Is it unreasonable to expect App Engine team to do this themselves when NDB_USE_CROSS_COMPATIBLE_PICKLE_PROTOCOL flag is set?

@kaansoral
Copy link
Author

This ended up being the patch that was refused here https://issuetracker.google.com/issues/322622693

def _from_base_type(self, value):
	try:
		return pickle.loads(value)
	except:
		return pickle.loads(value,encoding='latin1')
ndb.model.PickleProperty._from_base_type=_from_base_type

@kaansoral kaansoral linked a pull request Jan 30, 2024 that will close this issue
@pnico
Copy link

pnico commented Feb 1, 2024

ah, yeah we're doing something similar just on a case-by-case basis where we know we have datetime objects in the pickle property.. it seems like the "except" should actually specify UnicodeDecode error there though ^

About what NDB_USE_CROSS_COMPATIBLE_PICKLE_PROTOCOL (and friends) are for, this is all a bit confusing so I could be wrong about this, but my understanding was that it was so you could store a pickled object in a Python 3 instance and still have it be readable by a still-running Python 2 instance, because it will be pickle protocol 2 which Python 2 can still parse (it can't parse the newer, default pickle protocol used by Python 3). Which if I'm correct, this probably matters less now that Google won't let you deploy anything with Python 2 any more.

But a UnicodeDecodeError in Python 3 will still happen with datetime objects pickled in Python 2 in any protocol version without using the 'latin1' encoding, but that's not a bug with this library, it's just a "fact of life" when unpickling Python2-pickled objects in Python 3 (I think :D )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants