Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Wrong JSON sent to BigQuery when both integer and float fields in schema #116

Closed
peshitz opened this issue Feb 8, 2018 · 4 comments
Assignees
Labels
type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@peshitz
Copy link

peshitz commented Feb 8, 2018

When both a float and integer field exist, integer fields (int64) are serialized as floats, resulting in an error from BigQuery (because in schema the field is defined as Integer).

df = pd.DataFrame([[1,1.1],[2,2.2] ],
                   index=['row 1', 'row 2'],
                   columns=['intColumn','floatColumn'])
 
# correct output
df.to_json()

# incorrect output (method used by pandas-gbq)
row = df.iloc[0]
row.to_json()

# correct output
df[['intColumn']].iloc[0].to_json()
@tswast
Copy link
Collaborator

tswast commented Feb 10, 2018

Seems the problem is with using DataFrame.iterrows(). It changes the dtypes to float64.

In [13]: for index, row in df.iterrows():
    ...:     print(row)
    ...:
intColumn      1.0
floatColumn    1.1
Name: row 1, dtype: float64
intColumn      2.0
floatColumn    2.2
Name: row 2, dtype: float64

In [14]: print(df.dtypes)
intColumn        int64
floatColumn    float64
dtype: object

@tswast tswast self-assigned this Feb 10, 2018
@max-sixty
Copy link
Contributor

DataFrame.iterrows(). It changes the dtypes to float64.

It returns pd.Series, which can only be one dtype

I think the best way to solve this is to do #96

@tswast
Copy link
Collaborator

tswast commented Feb 10, 2018

Confirmed that I can reproduce this with the following test:

    def test_upload_mixed_float_and_int(self):
        """Test that we can upload a dataframe containing an int64 and float64 column.
        See: https://github.com/pydata/pandas-gbq/issues/116
        """
        test_id = "mixed_float_and_int"
        test_size = 2
        df = DataFrame(
            [[1,1.1],[2,2.2]],
            index=['row 1', 'row 2'],
            columns=['intColumn','floatColumn'])

        gbq.to_gbq(
            df, self.destination_table + test_id,
            _get_project_id(),
            private_key=_get_private_key_path(),
            chunksize=10000)

        result_df = gbq.read_gbq("SELECT * FROM {0}".format(
            self.destination_table + test_id),
            project_id=_get_project_id(),
            private_key=_get_private_key_path())

        assert len(result_df) == test_size

It gives the error

google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.

tswast added a commit to tswast/python-bigquery-pandas that referenced this issue Feb 10, 2018
@tswast tswast changed the title Wrong JSON sent to BigQuery when both integer and float fields in schema BUG: Wrong JSON sent to BigQuery when both integer and float fields in schema Feb 12, 2018
@tswast tswast added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Feb 12, 2018
tswast added a commit that referenced this issue Feb 12, 2018
…#117)

* BUG: Fix uploading of dataframes containing int64 and float64 columns

Fixes #116 and #96 by loading data in CSV chunks.

* ENH: allow chunksize=None to disable chunking in to_gbq()

Also, fixes lint errors.

* TST: update min g-c-bq lib to 0.29.0 in CI

* BUG: pass schema to load job for to_gbq

* Generate schema if needed for table creation.

* Restore _generate_bq_schema, as it is used in tests.

* Add fixes to changelog.
@tswast
Copy link
Collaborator

tswast commented Feb 13, 2018

Fix for this release in version 0.3.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

3 participants