-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas should get the schema from bigquery if pushing to a table that already exists #315
Comments
pandas-gbq already calls Note: you'll need to filter out any columns that aren't in the DataFrame. |
@ShantanuKumar Would you mind sharing an example of when the default schema gets it wrong? That will be useful for testing. |
Imagine there is a table called
Now, when we have some dataframe
When I push this data to the table
I get this error
This is happening because of the |
@tswast What's the reason behind ignoring the https://github.com/pydata/pandas-gbq/blob/master/pandas_gbq/gbq.py#L646 The issue which I have right now is because of mismatching Also I think this issue should be marked as BUG. |
I made a test corresponding to the example you provided in #315 (comment) but it still fails due to different types. It is expected that the pandas.Timestamp dtype is used for uploading to TIMESTAMP columns. Please open a separate feature request for that issue if different types is a problem for you. Let's use this issue to track the problem different modes (required vs. nullable). def test_to_gbq_does_not_override_type(gbq_table, gbq_connector):
table_id = "test_to_gbq_does_not_override_type"
table_schema = {
"fields": [
{
"name": "event_ts",
"type": "TIMESTAMP",
},
{
"name": "event_type",
"type": "STRING",
},
]
}
df = DataFrame({
"event_ts": ["2020-03-03 01:00:00", "2020-03-03 02:00:00"],
"event_type": ["buy", "sell"]
})
gbq_table.create(table_id, table_schema)
gbq.to_gbq(
df,
"{0}.{1}".format(gbq_table.dataset_id, table_id),
project_id=gbq_connector.project_id,
if_exists="append",
)
actual = gbq_table.schema(table_id)
assert table_schema["fields"] == actual |
It actually works for
|
…ists="replace" (googleapis#670) A previous issue (googleapis#315) noted that when using `if_exists="append"`, pandas should retrieve the existing schema from BQ and update the user-supplied `table_schema` to ensure that all column modes (REQUIRED/NULLABLE) match those of the existing table. The commit that fixed this issue applies this policy to *all* write_dispositions, but it is only applicable when appending. When replacing, the user should be able to overwrite with new datatypes.
Right now, when pushing new data to an already existing table using
to_gbq
, with optionif_exists=append
, but no explicittable_schema
, pandas generates a default table schema, where themode
of the column, which takes value eitherREQUIRED
orNULLABLE
, by default is alwaysNULLABLE
.It would make sense for pandas to fetch schema, and apply those for case where
if_exists=append
instead of passing aNULLABLE
mode.The text was updated successfully, but these errors were encountered: