Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add timeout support #76

Merged
merged 7 commits into from
Aug 4, 2017
Merged

Conversation

hagino3000
Copy link
Contributor

@hagino3000 hagino3000 commented Aug 2, 2017

pandas-gbq seems to wait for query results forever.
In rare cases, query does not finish long time (e.g. in 60 minutes) although the same query always finishes in few minutes.

I added timeout handling using configuration.query.timeoutMs.

https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query

Example

import pandas as pd
import pandas_gbq.gbq

try:
    df = pd.read_gbq(query, project_id=xxx, configuration={
        'query': {'timeoutMs': 40000}
    })
    print(df.head())
except pandas_gbq.gbq.QueryTimeout as e:
    print(e)
    pass

Output

Requesting query... ok.
Job ID: job_8OSgsu7nYKNYycnlgAB2_KWTGGk
Query running...
  Elapsed 13.78 s. Waiting...
  Elapsed 25.08 s. Waiting...
  Elapsed 36.27 s. Waiting...
Query timeout: 40000 ms

@codecov-io
Copy link

codecov-io commented Aug 2, 2017

Codecov Report

Merging #76 into master will decrease coverage by 45.26%.
The diff coverage is 30%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #76       +/-   ##
===========================================
- Coverage   73.44%   28.18%   -45.27%     
===========================================
  Files           4        4               
  Lines        1544     1554       +10     
===========================================
- Hits         1134      438      -696     
- Misses        410     1116      +706
Impacted Files Coverage Δ
pandas_gbq/tests/test_gbq.py 27.89% <20%> (-54.99%) ⬇️
pandas_gbq/gbq.py 19.4% <40%> (-55.93%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be301e2...415a8d5. Read the comment docs.

@hagino3000 hagino3000 changed the title ENH: Add timeout support [WIP]ENH: Add timeout support Aug 2, 2017
@hagino3000
Copy link
Contributor Author

Test results

`--> GBQ_PROJECT_ID=xxx  GBQ_GOOGLE_APPLICATION_CREDENTIALS=yyy.json pytest pandas_gbq
===================================================================== test session starts =====================================================================
platform darwin -- Python 3.6.2, pytest-3.2.0, py-1.4.34, pluggy-0.4.0
rootdir: /Users/t-nishibayashi/dev/workspace/BigQuery-Python-dev/lib/pandas-gbq, inifile:
plugins: cov-2.5.1
collected 95 items

pandas_gbq/tests/test_gbq.py ........FF.........................................................F.s.........................

========================================================================== FAILURES ===========================================================================
_____________________ TestGBQConnectorIntegrationWithLocalUserAccountAuth.test_get_user_account_credentials_bad_file_returns_credentials ______________________

self = <pandas_gbq.tests.test_gbq.TestGBQConnectorIntegrationWithLocalUserAccountAuth object at 0x11327ba90>

    def test_get_user_account_credentials_bad_file_returns_credentials(self):
        import mock
        from google.auth.credentials import Credentials
>       with mock.patch('__main__.open', side_effect=IOError()):

pandas_gbq/tests/test_gbq.py:231:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../env/lib/python3.6/site-packages/mock.py:1268: in __enter__
    original, local = self.get_original()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <mock._patch object at 0x112f1c978>

    def get_original(self):
        target = self.getter()
        name = self.attribute

        original = DEFAULT
        local = False

        try:
            original = target.__dict__[name]
        except (AttributeError, KeyError):
            original = getattr(target, name, DEFAULT)
        else:
            local = True

        if not self.create and original is DEFAULT:
            raise AttributeError(
>               "%s does not have the attribute %r" % (target, name)
            )
E           AttributeError: <module '__main__' from '/Users/t-nishibayashi/dev/workspace/BigQuery-Python-dev/env/bin/pytest'> does not have the attribute 'open
'

../../env/lib/python3.6/site-packages/mock.py:1242: AttributeError
__________________________ TestGBQConnectorIntegrationWithLocalUserAccountAuth.test_get_user_account_credentials_returns_credentials __________________________

self = <pandas_gbq.tests.test_gbq.TestGBQConnectorIntegrationWithLocalUserAccountAuth object at 0x11301d940>

    def test_get_user_account_credentials_returns_credentials(self):
        from google.auth.credentials import Credentials
>       credentials = self.sut.get_user_account_credentials()

pandas_gbq/tests/test_gbq.py:237:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas_gbq/gbq.py:340: in get_user_account_credentials
    credentials = self.load_user_account_credentials()
pandas_gbq/gbq.py:298: in load_user_account_credentials
    credentials.refresh(request)
../../env/lib/python3.6/site-packages/google/oauth2/credentials.py:126: in refresh
    self._client_secret))
../../env/lib/python3.6/site-packages/google/oauth2/_client.py:189: in refresh_grant
    response_data = _token_endpoint_request(request, token_uri, body)
../../env/lib/python3.6/site-packages/google/oauth2/_client.py:109: in _token_endpoint_request
    _handle_error_response(response_body)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

response_body = '{\n  "error" : "invalid_grant",\n  "error_description" : "Token has been expired or revoked."\n}'

    def _handle_error_response(response_body):
        """"Translates an error response into an exception.

        Args:
            response_body (str): The decoded response data.

        Raises:
            google.auth.exceptions.RefreshError
        """
        try:
            error_data = json.loads(response_body)
            error_details = '{}: {}'.format(
                error_data['error'],
                error_data.get('error_description'))
        # If no details could be extracted, use the response data.
        except (KeyError, ValueError):
            error_details = response_body

        raise exceptions.RefreshError(
>           error_details, response_body)
E       google.auth.exceptions.RefreshError: ('invalid_grant: Token has been expired or revoked.', '{\n  "error" : "invalid_grant",\n  "error_description" : "T
oken has been expired or revoked."\n}')

../../env/lib/python3.6/site-packages/google/oauth2/_client.py:59: RefreshError
___________________________________ TestToGBQIntegrationWithServiceAccountKeyPath.test_upload_data_if_table_exists_replace ____________________________________

self = <pandas_gbq.tests.test_gbq.TestToGBQIntegrationWithServiceAccountKeyPath object at 0x114942dd8>

    def test_upload_data_if_table_exists_replace(self):
        test_id = "4"
        test_size = 10
        df = make_mixed_dataframe_v2(test_size)
        df_different_schema = tm.makeMixedDataFrame()

        # Initialize table with sample data
        gbq.to_gbq(df, self.destination_table + test_id, _get_project_id(),
                   chunksize=10000, private_key=_get_private_key_path())

        # Test the if_exists parameter with the value 'replace'.
        gbq.to_gbq(df_different_schema, self.destination_table + test_id,
                   _get_project_id(), if_exists='replace',
>                  private_key=_get_private_key_path())

pandas_gbq/tests/test_gbq.py:1062:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas_gbq/gbq.py:1068: in to_gbq
    connector.load_data(dataframe, dataset_id, table_id, chunksize)
pandas_gbq/gbq.py:669: in load_data
    self.process_insert_errors(insert_errors)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <pandas_gbq.gbq.GbqConnector object at 0x114993dd8>
insert_errors = [{'errors': [{'debugInfo': 'generic::not_found: no such field.', 'location': 'b', 'message': 'no such field.', 'reason...'generic::not_found: n
o such field.', 'location': 'b', 'message': 'no such field.', 'reason': 'invalid'}], 'index': 4}]

    def process_insert_errors(self, insert_errors):
        for insert_error in insert_errors:
            row = insert_error['index']
            errors = insert_error.get('errors', None)
            for error in errors:
                reason = error['reason']
                message = error['message']
                location = error['location']
                error_message = ('Error at Row: {0}, Reason: {1}, '
                                 'Location: {2}, Message: {3}'
                                 .format(row, reason, location, message))

                # Report all error messages if verbose is set
                if self.verbose:
                    self._print(error_message)
                else:
                    raise StreamingInsertError(error_message +
                                               '\nEnable verbose logging to '
                                               'see all errors')

>       raise StreamingInsertError
E       pandas_gbq.gbq.StreamingInsertError

pandas_gbq/gbq.py:485: StreamingInsertError
-------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------



Streaming Insert is 100.0% Complete


The existing table has a different schema. Please wait 2 minutes. See Google BigQuery issue #191

@hagino3000
Copy link
Contributor Author

Ummm, I cannot fix 3 fails of tests.

@hagino3000 hagino3000 changed the title [WIP]ENH: Add timeout support ENH: Add timeout support Aug 2, 2017
@hagino3000
Copy link
Contributor Author

Also, I have run tests on the master branch. Same 3 tests are failed.

Copy link
Contributor

@parthea parthea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor observation. Otherwise, looks good to me.

Could you add an entry to the change log under 0.2.1?

https://github.com/pydata/pandas-gbq/blob/master/docs/source/changelog.rst

@@ -536,6 +543,11 @@ def run_query(self, query, **kwargs):

while not query_reply.get('jobComplete', False):
self.print_elapsed_seconds(' Elapsed', 's. Waiting...')

timeoutMs = job_config['query'].get('timeoutMs')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeoutMs -> timeout_ms to be consistent with existing code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, I'll fix it.

@parthea
Copy link
Contributor

parthea commented Aug 3, 2017

@hagino3000 I've created #76 and #78 to address the unit test failures. Please feel free to add additional information.

@hagino3000
Copy link
Contributor Author

@parthea I added an entry to change log, and fix the variable name.

Copy link
Contributor

@parthea parthea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @hagino3000 !

I'm going to make a minor adjustment to the change log before merging to mention that QueryTimeout will be raised when the specified timeout has expired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants