-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structs lack proper names as dicts and arrays get turned into array of dicts #23
Comments
@jasonqng if you wouldn't mind debugging this locally and see what the raw returns from bq are for these cases (and show here)? IOW about here: https://github.com/pydata/pandas-gbq/blob/master/pandas_gbq/gbq.py#L607 |
This is what I get back when running from
|
What I get for the second case, I get a proper list/array:
I'll tinker around a bit more tomorrow. But looks like this shouldn't be too hard to fix. 🤞 |
Including the result from the api log as well (using
Full bq command:
|
So this is not the correct way to do this. You would actually construct directly the result, but an idea. You will have to figure out semantically what the result is and how to describe this most usefully while representing the structure. Probably a MultiIndex on the columns makes sense here.
Keep in mind that you want to the most natural representation. If its not obvious, then don't do it. |
@jreback Not sure what you mean that this is not the correct way to do this. Certainly, the output you produce is one way of outputting the results (in a flattened way) and it can definitely be produced once you have the result in proper dict form. But as seen in the bq command line output, a dict is the expected output of col as opposed to the current behavior where the field name is lost altogether. Poked around and it looks like the field type of a struct in the schema returned from bq is Similar debugging info for the latter query above where the returned column is an array of two strings:
|
@jasonqng I mean the idea here is to preserve the structure, but at the same time make it useful. As far as contstruction you also might be able to directly construct with |
@jasonqng want to take a crack at this? |
@jreback Sure, have a hack day on Friday, hopefully will have time to give it a shot. Thanks. |
great! |
Just curious, how wedded is this project to google-api-python-client? I started to code up something for converting the mangled struct into the correct schema, but recalled that the google-cloud-python package doesn't have this issue, they already do a nice clean up of the field names for structs and arrays. Try for example:
Output:
|
Same goes for arrays:
Looking at https://developers.google.com/api-client-library/python/apis/bigquery/v2, Google recommends switching over to |
@jasonqng love to make the switch, The only blocker would be that ideally would want the deps built on |
@jreback I'm not as familiar with dealing with conda and dependency stuff, so would it make sense for me to go ahead and open a PR for switching over read_gbq() and I'll let you handle the other end? |
sure i can fix that |
This was (likely) fixed by #25, but I'd like to make sure we have a test in the test suite before closing it out. |
PR (and bugfix) for tests here: #101 |
Version 0.1.4
This query returns a improperly named dict:
Compare with result from Big Query:
An array of items also get turned into a arrays of dicts sometimes. For example:
outputs:
Compare to Big Query:
These issues may or may not be related?
The text was updated successfully, but these errors were encountered: