Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of bounds nanosecond timestamp: 1-01-01 00:00:00 #365

Closed
flakobatako opened this issue May 27, 2021 · 4 comments · Fixed by #444
Closed

Out of bounds nanosecond timestamp: 1-01-01 00:00:00 #365

flakobatako opened this issue May 27, 2021 · 4 comments · Fixed by #444
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@flakobatako
Copy link

flakobatako commented May 27, 2021

Good afternoon.

I have a table where there is a column named maxDate.

There are some rows (like 1%) that can have the following date: 0001-01-01T00:00:00.

The problem is that it lives in a GBQ from a external vendor to the company, so icannot modify the data and when i call th query like this:
df = pandas_gbq.read_gbq(
sql, credentials=credentials)

I have the error: File "pandas_libs\tslibs\np_datetime.pyx", line 117, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 00:00:00.

Is there a workaround or config so i can execute the query correctly?

Thank you in advance.

@max-sixty
Copy link
Contributor

I'm not sure there's a reasonable way of handling this given the numpy type — how about uploading it as a string and then transforming in BQ?

Though is that a permissible value in BQ, even if this library could upload it?

@tswast tswast added the type: question Request for information or clarification. Not an issue. label Jul 1, 2021
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Jul 17, 2021
@tswast
Copy link
Collaborator

tswast commented Sep 22, 2021

@flakobatako What is the BigQuery data type of the column? e.g. DATE/DATETIME/TIMESTAMP?

For TIMESTAMP columns, I believe this was fixed in google-cloud-bigquery 1.27.0. For DATETIME and DATE, I'm hoping to have a better solution in google-cloud-bigquery v3. googleapis/python-bigquery#861, googleapis/python-bigquery#972

@tswast
Copy link
Collaborator

tswast commented Nov 19, 2021

Thought: Possible we can make this compatible with v2 if we duplicate some of the logic to detect out-of-bounds values and pass date_as_object.

@tswast tswast added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed type: question Request for information or clarification. Not an issue. labels Nov 30, 2021
@tswast tswast self-assigned this Nov 30, 2021
@tswast
Copy link
Collaborator

tswast commented Dec 6, 2021

I can reproduce this in a test in #442

________________________ test_dataframe_round_trip_with_table_schema[load_parquet-issue365-extreme-datetimes] _________________________

method_under_test = functools.partial(<function to_gbq at 0x7f9090ebb710>, project_id='swast-scratch', credentials=<google.oauth2.credentials.Credentials object at 0x7f905058b250>)
random_dataset_id = 'python_bigquery_pandas_tests_system_20211206201607_aa0fdb'
bigquery_client = <google.cloud.bigquery.client.Client object at 0x7f9050574950>
input_df =    row_num    date_col                datetime_col                     timestamp_col
0        1  0001-01-01         00...         1970-01-01 00:00:00+00:00
2        3  9999-12-31  9999-12-31 23:59:59.999999  9999-12-31 23:59:59.999999+00:00
expected_df =    row_num    date_col                datetime_col                     timestamp_col
0        1  0001-01-01         00...         1970-01-01 00:00:00+00:00
2        3  9999-12-31  9999-12-31 23:59:59.999999  9999-12-31 23:59:59.999999+00:00
table_schema = [{'name': 'row_num', 'type': 'INTEGER'}, {'name': 'date_col', 'type': 'DATE'}, {'name': 'datetime_col', 'type': 'DATETIME'}, {'name': 'timestamp_col', 'type': 'TIMESTAMP'}]
api_method = 'load_parquet', api_methods = {'load_csv', 'load_parquet'}

                                59,
                                59,
                                999999,
                                tzinfo=datetime.timezone.utc,
                            ),
                        ],
                    },
                    columns=["row_num", "date_col", "datetime_col", "timestamp_col"],
                ),
                table_schema=[
                    {"name": "row_num", "type": "INTEGER"},
                    {"name": "date_col", "type": "DATE"},
                    {"name": "datetime_col", "type": "DATETIME"},
                    {"name": "timestamp_col", "type": "TIMESTAMP"},
                ],
            ),
            id="issue365-extreme-datetimes",
        ),
    ]
    
    
    @pytest.mark.parametrize(
>       ["input_df", "expected_df", "table_schema", "api_methods"], DATAFRAME_ROUND_TRIPS
    )
    def test_dataframe_round_trip_with_table_schema(
        method_under_test,
        random_dataset_id,
        bigquery_client,
        input_df,
        expected_df,
        table_schema,
        api_method,
        api_methods,
    ):

tests/system/test_to_gbq.py:243: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.nox/system-3-7/lib/python3.7/site-packages/google/cloud/bigquery/table.py:1729: in to_dataframe
    df = record_batch.to_pandas(date_as_object=date_as_object)
pyarrow/array.pxi:751: in pyarrow.lib._PandasConvertible.to_pandas
    ???
pyarrow/table.pxi:1668: in pyarrow.lib.Table._to_pandas
    ???
.nox/system-3-7/lib/python3.7/site-packages/pyarrow/pandas_compat.py:792: in table_to_blockmanager
    blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
.nox/system-3-7/lib/python3.7/site-packages/pyarrow/pandas_compat.py:1132: in _table_to_blocks
    list(extension_columns.keys()))
pyarrow/table.pxi:1117: in pyarrow.lib.table_to_blocks
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   pyarrow.lib.ArrowInvalid: Casting from timestamp[us] to timestamp[ns] would result in out of bounds timestamp: -62135596800000000

pyarrow/error.pxi:84: ArrowInvalid
-------------------------------------------------------- Captured stderr call ---------------------------------------------------------
100%|██████████| 1/1 [00:00<00:00, 5915.80it/s]
__________________________ test_dataframe_round_trip_with_table_schema[load_csv-issue365-extreme-datetimes] ___________________________

method_under_test = functools.partial(<function to_gbq at 0x7f9090ebb710>, project_id='swast-scratch', credentials=<google.oauth2.credentials.Credentials object at 0x7f905058b250>)
random_dataset_id = 'python_bigquery_pandas_tests_system_20211206201653_cf3865'
bigquery_client = <google.cloud.bigquery.client.Client object at 0x7f9050574950>
input_df =    row_num    date_col                datetime_col                     timestamp_col
0        1  0001-01-01         00...         1970-01-01 00:00:00+00:00
2        3  9999-12-31  9999-12-31 23:59:59.999999  9999-12-31 23:59:59.999999+00:00
expected_df =    row_num    date_col                datetime_col                     timestamp_col
0        1  0001-01-01         00...         1970-01-01 00:00:00+00:00
2        3  9999-12-31  9999-12-31 23:59:59.999999  9999-12-31 23:59:59.999999+00:00
table_schema = [{'name': 'row_num', 'type': 'INTEGER'}, {'name': 'date_col', 'type': 'DATE'}, {'name': 'datetime_col', 'type': 'DATETIME'}, {'name': 'timestamp_col', 'type': 'TIMESTAMP'}]
api_method = 'load_csv', api_methods = {'load_csv', 'load_parquet'}

                                59,
                                59,
                                999999,
                                tzinfo=datetime.timezone.utc,
                            ),
                        ],
                    },
                    columns=["row_num", "date_col", "datetime_col", "timestamp_col"],
                ),
                table_schema=[
                    {"name": "row_num", "type": "INTEGER"},
                    {"name": "date_col", "type": "DATE"},
                    {"name": "datetime_col", "type": "DATETIME"},
                    {"name": "timestamp_col", "type": "TIMESTAMP"},
                ],
            ),
            id="issue365-extreme-datetimes",
        ),
    ]
    
    
    @pytest.mark.parametrize(
>       ["input_df", "expected_df", "table_schema", "api_methods"], DATAFRAME_ROUND_TRIPS
    )
    def test_dataframe_round_trip_with_table_schema(
        method_under_test,
        random_dataset_id,
        bigquery_client,
        input_df,
        expected_df,
        table_schema,
        api_method,
        api_methods,
    ):

tests/system/test_to_gbq.py:243: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.nox/system-3-7/lib/python3.7/site-packages/google/cloud/bigquery/table.py:1729: in to_dataframe
    df = record_batch.to_pandas(date_as_object=date_as_object)
pyarrow/array.pxi:751: in pyarrow.lib._PandasConvertible.to_pandas
    ???
pyarrow/table.pxi:1668: in pyarrow.lib.Table._to_pandas
    ???
.nox/system-3-7/lib/python3.7/site-packages/pyarrow/pandas_compat.py:792: in table_to_blockmanager
    blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
.nox/system-3-7/lib/python3.7/site-packages/pyarrow/pandas_compat.py:1132: in _table_to_blocks
    list(extension_columns.keys()))
pyarrow/table.pxi:1117: in pyarrow.lib.table_to_blocks
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   pyarrow.lib.ArrowInvalid: Casting from timestamp[us] to timestamp[ns] would result in out of bounds timestamp: -62135596800000000

pyarrow/error.pxi:84: ArrowInvalid
-------------------------------------------------------- Captured stderr call ---------------------------------------------------------
1it [00:04,  4.31s/it]
========================================================== warnings summary ===========================================================
tests/system/test_gbq.py::TestReadGBQIntegration::test_array_length_zero
  /Users/swast/src/github.com/googleapis/python-bigquery-pandas/.nox/system-3-7/lib/python3.7/site-packages/pandas/core/dtypes/missing.py:415: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
    if left_value != right_value:

tests/system/test_gbq.py::TestToGBQIntegration::test_upload_data
  /Users/swast/src/github.com/googleapis/python-bigquery-pandas/tests/system/test_gbq.py:948: PendingDeprecationWarning: chunksize will be ignored when using api_method='load_csv' in a future version of pandas-gbq
    credentials=self.credentials,

tests/system/test_gbq.py::TestToGBQIntegration::test_upload_data_if_table_exists_append
  /Users/swast/src/github.com/googleapis/python-bigquery-pandas/tests/system/test_gbq.py:1032: PendingDeprecationWarning: chunksize will be ignored when using api_method='load_csv' in a future version of pandas-gbq
    credentials=self.credentials,

tests/system/test_gbq.py::TestToGBQIntegration::test_upload_subset_columns_if_table_exists_append
  /Users/swast/src/github.com/googleapis/python-bigquery-pandas/tests/system/test_gbq.py:1078: PendingDeprecationWarning: chunksize will be ignored when using api_method='load_csv' in a future version of pandas-gbq
    credentials=self.credentials,

tests/system/test_gbq.py::TestToGBQIntegration::test_upload_data_if_table_exists_replace
  /Users/swast/src/github.com/googleapis/python-bigquery-pandas/tests/system/test_gbq.py:1112: PendingDeprecationWarning: chunksize will be ignored when using api_method='load_csv' in a future version of pandas-gbq
    credentials=self.credentials,

tests/system/test_gbq.py::TestToGBQIntegration::test_upload_data_flexible_column_order
  /Users/swast/src/github.com/googleapis/python-bigquery-pandas/tests/system/test_gbq.py:1241: PendingDeprecationWarning: chunksize will be ignored when using api_method='load_csv' in a future version of pandas-gbq
    credentials=self.credentials,

-- Docs: https://docs.pytest.org/en/stable/warnings.html
------------- generated xml file: /Users/swast/src/github.com/googleapis/python-bigquery-pandas/system_3.7_sponge_log.xml -------------
======================================================= short test summary info =======================================================
FAILED tests/system/test_to_gbq.py::test_dataframe_round_trip_with_table_schema[load_parquet-issue365-extreme-datetimes] - pyarrow.l...
FAILED tests/system/test_to_gbq.py::test_dataframe_round_trip_with_table_schema[load_csv-issue365-extreme-datetimes] - pyarrow.lib.A...
2 failed, 112 passed, 3 skipped, 6 warnings in 634.00s (0:10:34)
nox > Command py.test --quiet --junitxml=system_3.7_sponge_log.xml tests/system failed with exit code 1
nox > Session system-3.7 failed.

The good news is that it's fixed when I work with the google-cloud-bigquery v3 prerelease. googleapis/python-bigquery#972 For other versions we'll need to catch and retry with date_as_object=True.

gcf-merge-on-green bot pushed a commit that referenced this issue Dec 7, 2021
…oad_gbq` (#442)

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
- [ ] Make sure to open an issue as a [bug/issue](https://github.com/googleapis/python-bigquery-pandas/issues/new/choose) before writing your code!  That way we can discuss the change, evaluate designs, and agree on the general idea
- [ ] Ensure the tests and linter pass
- [ ] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Fixes #441 
Towards #365 
🦕
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
3 participants