BUG: Wrong JSON sent to BigQuery when both integer and float fields in schema #116

peshitz · 2018-02-08T13:18:13Z

When both a float and integer field exist, integer fields (int64) are serialized as floats, resulting in an error from BigQuery (because in schema the field is defined as Integer).

df = pd.DataFrame([[1,1.1],[2,2.2] ],
                   index=['row 1', 'row 2'],
                   columns=['intColumn','floatColumn'])
 
# correct output
df.to_json()

# incorrect output (method used by pandas-gbq)
row = df.iloc[0]
row.to_json()

# correct output
df[['intColumn']].iloc[0].to_json()

The text was updated successfully, but these errors were encountered:

tswast · 2018-02-10T00:39:35Z

Seems the problem is with using DataFrame.iterrows(). It changes the dtypes to float64.

In [13]: for index, row in df.iterrows():
    ...:     print(row)
    ...:
intColumn      1.0
floatColumn    1.1
Name: row 1, dtype: float64
intColumn      2.0
floatColumn    2.2
Name: row 2, dtype: float64

In [14]: print(df.dtypes)
intColumn        int64
floatColumn    float64
dtype: object

max-sixty · 2018-02-10T01:00:06Z

DataFrame.iterrows(). It changes the dtypes to float64.

It returns pd.Series, which can only be one dtype

I think the best way to solve this is to do #96

tswast · 2018-02-10T01:41:59Z

Confirmed that I can reproduce this with the following test:

    def test_upload_mixed_float_and_int(self):
        """Test that we can upload a dataframe containing an int64 and float64 column.
        See: https://github.com/pydata/pandas-gbq/issues/116
        """
        test_id = "mixed_float_and_int"
        test_size = 2
        df = DataFrame(
            [[1,1.1],[2,2.2]],
            index=['row 1', 'row 2'],
            columns=['intColumn','floatColumn'])

        gbq.to_gbq(
            df, self.destination_table + test_id,
            _get_project_id(),
            private_key=_get_private_key_path(),
            chunksize=10000)

        result_df = gbq.read_gbq("SELECT * FROM {0}".format(
            self.destination_table + test_id),
            project_id=_get_project_id(),
            private_key=_get_private_key_path())

        assert len(result_df) == test_size

It gives the error

google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.

Fixes googleapis#116 and googleapis#96 by loading data in CSV chunks.

…#117) * BUG: Fix uploading of dataframes containing int64 and float64 columns Fixes #116 and #96 by loading data in CSV chunks. * ENH: allow chunksize=None to disable chunking in to_gbq() Also, fixes lint errors. * TST: update min g-c-bq lib to 0.29.0 in CI * BUG: pass schema to load job for to_gbq * Generate schema if needed for table creation. * Restore _generate_bq_schema, as it is used in tests. * Add fixes to changelog.

tswast · 2018-02-13T21:40:20Z

Fix for this release in version 0.3.1.

tswast self-assigned this Feb 10, 2018

tswast added a commit to tswast/python-bigquery-pandas that referenced this issue Feb 10, 2018

BUG: Fix uploading of dataframes containing int64 and float64 columns

616d306

Fixes googleapis#116 and googleapis#96 by loading data in CSV chunks.

tswast mentioned this issue Feb 10, 2018

BUG: Fix uploading of dataframes containing int64 and float64 columns #117

Merged

tswast changed the title ~~Wrong JSON sent to BigQuery when both integer and float fields in schema~~ BUG: Wrong JSON sent to BigQuery when both integer and float fields in schema Feb 12, 2018

tswast added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Feb 12, 2018

tswast closed this as completed in #117 Feb 12, 2018

This was referenced Feb 13, 2018

Pin pandas-gbq to latest version 0.3.1 tnir/pandas#48

Closed

Pin pandas_gbq to latest version 0.3.1 tnir/pandas#49

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Wrong JSON sent to BigQuery when both integer and float fields in schema #116

BUG: Wrong JSON sent to BigQuery when both integer and float fields in schema #116

peshitz commented Feb 8, 2018 •

edited

Loading

tswast commented Feb 10, 2018

max-sixty commented Feb 10, 2018

tswast commented Feb 10, 2018

tswast commented Feb 13, 2018

BUG: Wrong JSON sent to BigQuery when both integer and float fields in schema #116

BUG: Wrong JSON sent to BigQuery when both integer and float fields in schema #116

Comments

peshitz commented Feb 8, 2018 • edited Loading

tswast commented Feb 10, 2018

max-sixty commented Feb 10, 2018

tswast commented Feb 10, 2018

tswast commented Feb 13, 2018

peshitz commented Feb 8, 2018 •

edited

Loading