Out of bounds nanosecond timestamp: 1-01-01 00:00:00 #365

flakobatako · 2021-05-27T18:21:21Z

Good afternoon.

I have a table where there is a column named maxDate.

There are some rows (like 1%) that can have the following date: 0001-01-01T00:00:00.

The problem is that it lives in a GBQ from a external vendor to the company, so icannot modify the data and when i call th query like this:
df = pandas_gbq.read_gbq(
sql, credentials=credentials)

I have the error: File "pandas_libs\tslibs\np_datetime.pyx", line 117, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 00:00:00.

Is there a workaround or config so i can execute the query correctly?

Thank you in advance.

max-sixty · 2021-05-27T19:32:39Z

I'm not sure there's a reasonable way of handling this given the numpy type — how about uploading it as a string and then transforming in BQ?

Though is that a permissible value in BQ, even if this library could upload it?

tswast · 2021-09-22T21:26:38Z

@flakobatako What is the BigQuery data type of the column? e.g. DATE/DATETIME/TIMESTAMP?

For TIMESTAMP columns, I believe this was fixed in google-cloud-bigquery 1.27.0. For DATETIME and DATE, I'm hoping to have a better solution in google-cloud-bigquery v3. googleapis/python-bigquery#861, googleapis/python-bigquery#972

tswast · 2021-11-19T21:47:00Z

Thought: Possible we can make this compatible with v2 if we duplicate some of the logic to detect out-of-bounds values and pass date_as_object.

tswast · 2021-12-06T21:49:37Z

I can reproduce this in a test in #442

________________________ test_dataframe_round_trip_with_table_schema[load_parquet-issue365-extreme-datetimes] _________________________

method_under_test = functools.partial(<function to_gbq at 0x7f9090ebb710>, project_id='swast-scratch', credentials=<google.oauth2.credentials.Credentials object at 0x7f905058b250>)
random_dataset_id = 'python_bigquery_pandas_tests_system_20211206201607_aa0fdb'
bigquery_client = <google.cloud.bigquery.client.Client object at 0x7f9050574950>
input_df =    row_num    date_col                datetime_col                     timestamp_col
0        1  0001-01-01         00...         1970-01-01 00:00:00+00:00
2        3  9999-12-31  9999-12-31 23:59:59.999999  9999-12-31 23:59:59.999999+00:00
expected_df =    row_num    date_col                datetime_col                     timestamp_col
0        1  0001-01-01         00...         1970-01-01 00:00:00+00:00
2        3  9999-12-31  9999-12-31 23:59:59.999999  9999-12-31 23:59:59.999999+00:00
table_schema = [{'name': 'row_num', 'type': 'INTEGER'}, {'name': 'date_col', 'type': 'DATE'}, {'name': 'datetime_col', 'type': 'DATETIME'}, {'name': 'timestamp_col', 'type': 'TIMESTAMP'}]
api_method = 'load_parquet', api_methods = {'load_csv', 'load_parquet'}

                                59,
                                59,
                                999999,
                                tzinfo=datetime.timezone.utc,
                            ),
                        ],
                    },
                    columns=["row_num", "date_col", "datetime_col", "timestamp_col"],
                ),
                table_schema=[
                    {"name": "row_num", "type": "INTEGER"},
                    {"name": "date_col", "type": "DATE"},
                    {"name": "datetime_col", "type": "DATETIME"},
                    {"name": "timestamp_col", "type": "TIMESTAMP"},
                ],
            ),
            id="issue365-extreme-datetimes",
        ),
    ]
    
    
    @pytest.mark.parametrize(
>       ["input_df", "expected_df", "table_schema", "api_methods"], DATAFRAME_ROUND_TRIPS
    )
    def test_dataframe_round_trip_with_table_schema(
        method_under_test,
        random_dataset_id,
        bigquery_client,
        input_df,
        expected_df,
        table_schema,
        api_method,
        api_methods,
    ):

tests/system/test_to_gbq.py:243: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.nox/system-3-7/lib/python3.7/site-packages/google/cloud/bigquery/table.py:1729: in to_dataframe
    df = record_batch.to_pandas(date_as_object=date_as_object)
pyarrow/array.pxi:751: in pyarrow.lib._PandasConvertible.to_pandas
    ???
pyarrow/table.pxi:1668: in pyarrow.lib.Table._to_pandas
    ???
.nox/system-3-7/lib/python3.7/site-packages/pyarrow/pandas_compat.py:792: in table_to_blockmanager
    blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
.nox/system-3-7/lib/python3.7/site-packages/pyarrow/pandas_compat.py:1132: in _table_to_blocks
    list(extension_columns.keys()))
pyarrow/table.pxi:1117: in pyarrow.lib.table_to_blocks
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   pyarrow.lib.ArrowInvalid: Casting from timestamp[us] to timestamp[ns] would result in out of bounds timestamp: -62135596800000000

pyarrow/error.pxi:84: ArrowInvalid
-------------------------------------------------------- Captured stderr call ---------------------------------------------------------
100%|██████████| 1/1 [00:00<00:00, 5915.80it/s]
__________________________ test_dataframe_round_trip_with_table_schema[load_csv-issue365-extreme-datetimes] ___________________________

method_under_test = functools.partial(<function to_gbq at 0x7f9090ebb710>, project_id='swast-scratch', credentials=<google.oauth2.credentials.Credentials object at 0x7f905058b250>)
random_dataset_id = 'python_bigquery_pandas_tests_system_20211206201653_cf3865'
bigquery_client = <google.cloud.bigquery.client.Client object at 0x7f9050574950>
input_df =    row_num    date_col                datetime_col                     timestamp_col
0        1  0001-01-01         00...         1970-01-01 00:00:00+00:00
2        3  9999-12-31  9999-12-31 23:59:59.999999  9999-12-31 23:59:59.999999+00:00
expected_df =    row_num    date_col                datetime_col                     timestamp_col
0        1  0001-01-01         00...         1970-01-01 00:00:00+00:00
2        3  9999-12-31  9999-12-31 23:59:59.999999  9999-12-31 23:59:59.999999+00:00
table_schema = [{'name': 'row_num', 'type': 'INTEGER'}, {'name': 'date_col', 'type': 'DATE'}, {'name': 'datetime_col', 'type': 'DATETIME'}, {'name': 'timestamp_col', 'type': 'TIMESTAMP'}]
api_method = 'load_csv', api_methods = {'load_csv', 'load_parquet'}

                                59,
                                59,
                                999999,
                                tzinfo=datetime.timezone.utc,
                            ),
                        ],
                    },
                    columns=["row_num", "date_col", "datetime_col", "timestamp_col"],
                ),
                table_schema=[
                    {"name": "row_num", "type": "INTEGER"},
                    {"name": "date_col", "type": "DATE"},
                    {"name": "datetime_col", "type": "DATETIME"},
                    {"name": "timestamp_col", "type": "TIMESTAMP"},
                ],
            ),
            id="issue365-extreme-datetimes",
        ),
    ]
    
    
    @pytest.mark.parametrize(
>       ["input_df", "expected_df", "table_schema", "api_methods"], DATAFRAME_ROUND_TRIPS
    )
    def test_dataframe_round_trip_with_table_schema(
        method_under_test,
        random_dataset_id,
        bigquery_client,
        input_df,
        expected_df,
        table_schema,
        api_method,
        api_methods,
    ):

tests/system/test_to_gbq.py:243: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.nox/system-3-7/lib/python3.7/site-packages/google/cloud/bigquery/table.py:1729: in to_dataframe
    df = record_batch.to_pandas(date_as_object=date_as_object)
pyarrow/array.pxi:751: in pyarrow.lib._PandasConvertible.to_pandas
    ???
pyarrow/table.pxi:1668: in pyarrow.lib.Table._to_pandas
    ???
.nox/system-3-7/lib/python3.7/site-packages/pyarrow/pandas_compat.py:792: in table_to_blockmanager
    blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
.nox/system-3-7/lib/python3.7/site-packages/pyarrow/pandas_compat.py:1132: in _table_to_blocks
    list(extension_columns.keys()))
pyarrow/table.pxi:1117: in pyarrow.lib.table_to_blocks
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   pyarrow.lib.ArrowInvalid: Casting from timestamp[us] to timestamp[ns] would result in out of bounds timestamp: -62135596800000000

pyarrow/error.pxi:84: ArrowInvalid
-------------------------------------------------------- Captured stderr call ---------------------------------------------------------
1it [00:04,  4.31s/it]
========================================================== warnings summary ===========================================================
tests/system/test_gbq.py::TestReadGBQIntegration::test_array_length_zero
  /Users/swast/src/github.com/googleapis/python-bigquery-pandas/.nox/system-3-7/lib/python3.7/site-packages/pandas/core/dtypes/missing.py:415: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
    if left_value != right_value:

tests/system/test_gbq.py::TestToGBQIntegration::test_upload_data
  /Users/swast/src/github.com/googleapis/python-bigquery-pandas/tests/system/test_gbq.py:948: PendingDeprecationWarning: chunksize will be ignored when using api_method='load_csv' in a future version of pandas-gbq
    credentials=self.credentials,

tests/system/test_gbq.py::TestToGBQIntegration::test_upload_data_if_table_exists_append
  /Users/swast/src/github.com/googleapis/python-bigquery-pandas/tests/system/test_gbq.py:1032: PendingDeprecationWarning: chunksize will be ignored when using api_method='load_csv' in a future version of pandas-gbq
    credentials=self.credentials,

tests/system/test_gbq.py::TestToGBQIntegration::test_upload_subset_columns_if_table_exists_append
  /Users/swast/src/github.com/googleapis/python-bigquery-pandas/tests/system/test_gbq.py:1078: PendingDeprecationWarning: chunksize will be ignored when using api_method='load_csv' in a future version of pandas-gbq
    credentials=self.credentials,

tests/system/test_gbq.py::TestToGBQIntegration::test_upload_data_if_table_exists_replace
  /Users/swast/src/github.com/googleapis/python-bigquery-pandas/tests/system/test_gbq.py:1112: PendingDeprecationWarning: chunksize will be ignored when using api_method='load_csv' in a future version of pandas-gbq
    credentials=self.credentials,

tests/system/test_gbq.py::TestToGBQIntegration::test_upload_data_flexible_column_order
  /Users/swast/src/github.com/googleapis/python-bigquery-pandas/tests/system/test_gbq.py:1241: PendingDeprecationWarning: chunksize will be ignored when using api_method='load_csv' in a future version of pandas-gbq
    credentials=self.credentials,

-- Docs: https://docs.pytest.org/en/stable/warnings.html
------------- generated xml file: /Users/swast/src/github.com/googleapis/python-bigquery-pandas/system_3.7_sponge_log.xml -------------
======================================================= short test summary info =======================================================
FAILED tests/system/test_to_gbq.py::test_dataframe_round_trip_with_table_schema[load_parquet-issue365-extreme-datetimes] - pyarrow.l...
FAILED tests/system/test_to_gbq.py::test_dataframe_round_trip_with_table_schema[load_csv-issue365-extreme-datetimes] - pyarrow.lib.A...
2 failed, 112 passed, 3 skipped, 6 warnings in 634.00s (0:10:34)
nox > Command py.test --quiet --junitxml=system_3.7_sponge_log.xml tests/system failed with exit code 1
nox > Session system-3.7 failed.

The good news is that it's fixed when I work with the google-cloud-bigquery v3 prerelease. googleapis/python-bigquery#972 For other versions we'll need to catch and retry with date_as_object=True.

…oad_gbq` (#442) Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly: - [ ] Make sure to open an issue as a [bug/issue](https://github.com/googleapis/python-bigquery-pandas/issues/new/choose) before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea - [ ] Ensure the tests and linter pass - [ ] Code coverage does not decrease (if any source code was changed) - [ ] Appropriate docs were updated (if necessary) Fixes #441 Towards #365 🦕

tswast added the type: question Request for information or clarification. Not an issue. label Jul 1, 2021

product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Jul 17, 2021

tswast added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed type: question Request for information or clarification. Not an issue. labels Nov 30, 2021

tswast self-assigned this Nov 30, 2021

This was referenced Dec 1, 2021

chore: release 0.16.1 #427

Closed

fix: allow extreme DATE values such as datetime.date(1, 1, 1) in load_gbq #442

Merged

This was referenced Dec 6, 2021

ENH: Read BigQuery query table without a query #266

Closed

fix: read_gbq supports extreme DATETIME values such as 0001-01-01 00:00:00 #444

Merged

tswast closed this as completed in #444 Jan 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of bounds nanosecond timestamp: 1-01-01 00:00:00 #365

Out of bounds nanosecond timestamp: 1-01-01 00:00:00 #365

flakobatako commented May 27, 2021 •

edited

Loading

max-sixty commented May 27, 2021

tswast commented Sep 22, 2021

tswast commented Nov 19, 2021

tswast commented Dec 6, 2021

Out of bounds nanosecond timestamp: 1-01-01 00:00:00 #365

Out of bounds nanosecond timestamp: 1-01-01 00:00:00 #365

Comments

flakobatako commented May 27, 2021 • edited Loading

max-sixty commented May 27, 2021

tswast commented Sep 22, 2021

tswast commented Nov 19, 2021

tswast commented Dec 6, 2021

flakobatako commented May 27, 2021 •

edited

Loading