Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

df.to_gbq(): InvalidSchema when using if_exists = 'append' #317

Closed
jaredbgo opened this issue Mar 30, 2020 · 11 comments
Closed

df.to_gbq(): InvalidSchema when using if_exists = 'append' #317

jaredbgo opened this issue Mar 30, 2020 · 11 comments

Comments

@jaredbgo
Copy link

jaredbgo commented Mar 30, 2020

I have a schema for my upload dataframe.

to_gbq() runs successfully if I am replacing the table (if_exists='replace')

to_gbq() fails if I am appending to the BQ table with matching schema (if_exists='append'):

InvalidSchema: Please verify that the structure and data types in the DataFrame match the schema of the destination table.

My schemas certainly should not be a problem, I was appending to the same table I just successfully wrote calling to_gbq() with if_exists ='replace' with the same schema dict.

@jaredbgo jaredbgo changed the title df.to_gbq(): InvalidSchema when using if_exists = 'append' BUG: df.to_gbq(): InvalidSchema when using if_exists = 'append' Mar 30, 2020
@jaredbgo jaredbgo changed the title BUG: df.to_gbq(): InvalidSchema when using if_exists = 'append' df.to_gbq(): InvalidSchema when using if_exists = 'append' Mar 30, 2020
@ShantanuKumar
Copy link
Contributor

It makes sense that it works with replace because the table is dropped, and the created with the new data.
In case of append, are you sure you're passing the same schema as that of the existing table in bigquery?
You need to make sure the data types are bigquery data types in your schema dict, not pandas data types. Something like this

[
    {
        "name": "event_ts",
        "type": "TIMESTAMP",
        "mode": "REQUIRED"
    },
    {
        "name": "event_type",
        "type": "STRING",
        "mode": "NULLABLE"
    }
]

@jaredbgo
Copy link
Author

jaredbgo commented Mar 30, 2020

The schema dictionary I pass when successfully using 'replace' is the same schema dictionary that fails when calling 'append'. The schema dictionary reflects what is in BigQuery.

The schema dictionary successfully generates the BQ table using 'replace', but fails when appending to that same BQ table when using 'append'.

Also of note is that the dataframe's columns are strings, some of which I want to turn into dates in BQ, which is why I am supplying a schema. Correct column conversion occurs when using 'replace', but perhaps that fails when using 'append'.

Using 'replace' + the schema dictionary, to_gbq() correctly turns some columns into dates and uploads to table.

Using 'append' + the schema dictionary, the error occurs.

@ShantanuKumar
Copy link
Contributor

ShantanuKumar commented Apr 26, 2020

This might be related to #315
I have opend a pull request #318

@tswast
Copy link
Collaborator

tswast commented Apr 29, 2020

@jaredbgo please provide a code sample with an example DataFrame (and corresponding BQ schema) where this failure occurs. It is possible that #318 will fix this issue.

@jaredbgo
Copy link
Author

jaredbgo commented May 4, 2020

Looks like #318 should fix. Thanks for looking into it!

@PythonCFO
Copy link

I have this exact issue and it does not appear to be resolved.

output = [['Test', 'v1', 'cost_center', '123', date(2020, 1, 31), 30.0],
['Test', 'v1', 'cost_center', '345', date(2020, 1, 31), 72.0]]

headers = ['scenario', 'version', 'entity', 'account', 'period', 'amount']
df_output = pd.DataFrame(output, columns=headers)

dataset_table = 'my_dataset'
project_id ='my_project_id'
table_schema = [{'name':'scenario', 'type':'string'},
{'name':'version', 'type':'string'},
{'name':'entity', 'type':'string'},
{'name':'account', 'type':'string'},
{'name':'period', 'date':'string'},
{'name':'amount', 'float':'string'}
]
df_output.to_gbq(destination_table=dataset_table,
project_id= project_id,
if_exists='append',
table_schema=table_schema)

It successfully creates the table in BQ the first time. Then I run the exact same query a second time and I get the
"pandas_gbq.gbq.InvalidSchema: Please verify that the structure and data types in the DataFrame match the schema of the destination table."

Andrew Good

@tswast
Copy link
Collaborator

tswast commented May 14, 2020

@andrewlgood Are you using the version from GitHub? We haven't released 0.13.2 yet with this fix.

@PythonCFO
Copy link

PythonCFO commented May 14, 2020 via email

@tswast
Copy link
Collaborator

tswast commented May 14, 2020

@andrewlgood I just released https://pypi.org/project/pandas-gbq/0.13.2/, hopefully that solves your issue.

@PythonCFO
Copy link

PythonCFO commented May 14, 2020 via email

@PythonCFO
Copy link

PythonCFO commented May 17, 2020 via email

@tswast tswast closed this as completed Nov 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants