Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing to CSV buffer with default float format causes numerical overflow in BigQuery #192

Closed
anthonydelage opened this issue Jul 22, 2018 · 3 comments

Comments

@anthonydelage
Copy link
Contributor

anthonydelage commented Jul 22, 2018

I'm using to_gbq() to load a local DataFrame into BigQuery. I'm running into an issue where floating point numbers are gaining significant figures and therefore causing numerical overflow errors when loaded to BigQuery.

The load.py module's encode_chunk() function writes to a local CSV buffer using Pandas' to_csv() function, which has a known issue regarding added significant figures on some operating systems (read more here).

In my case, 0.208 was transformed to 0.20800000000000002.

I've been able to solve the issue locally by changing the float_format parameter to '%g' in the encode_chunk() function's pd.to_csv() call:

dataframe.to_csv(
    csv_buffer, index=False, header=False, encoding='utf-8',
    float_format='%g', date_format='%Y-%m-%d %H:%M:%S.%f')

Can this be safely applied as a default?

Versions:

pandas==0.22.0
pandas-gbq==0.5.0

OS details:

MacOS 10.13.4
@max-sixty
Copy link
Contributor

Thanks for the report @anthonydelage

Without being an expert in floats, that looks very reasonable. How would you feel about doing a pull-request?

@anthonydelage
Copy link
Contributor Author

@max-imlian yes, it's in #193.

In it, I made one change to the proposal above: increasing the number of significant figures to 15 (decimal), the max safely allowed under IEEE-754 double formatting.

@parthea
Copy link
Contributor

parthea commented Jul 26, 2018

Fixed in 993fe55

@parthea parthea closed this as completed Jul 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants