Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test read_gbq with dryRun in configuration parameter #88

Open
WillianFuks opened this issue Sep 22, 2017 · 6 comments
Open

test read_gbq with dryRun in configuration parameter #88

WillianFuks opened this issue Sep 22, 2017 · 6 comments
Labels
accepting pull requests api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@WillianFuks
Copy link

Hello,

Recently a question featured on SO asking about running a read_gbq job with dryRun settings defined as True.

As it turns out, for what I could check, currently we can send query definitions but everything defined outside of query is discarded.

I wonder if it would be possible to also consider updating other values such as dryRun.

kwargs should probably be able to receive arguments such as configuration={"query": {...}, "dryRun": True}

and run_query probably would have to process job_config.update(config).

Best,

Will

@parthea parthea added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Sep 26, 2017
@tswast
Copy link
Collaborator

tswast commented Dec 8, 2017

Are there other properties besides dryRun that should be sent?

@tswast
Copy link
Collaborator

tswast commented Feb 12, 2018

I think a general “update” call would be a good way to implement this. We’d probably want some checks for duplicate values. I think the current implementation checks that query is not also defined in the job config.

@tswast tswast self-assigned this Mar 22, 2018
@tswast
Copy link
Collaborator

tswast commented Mar 22, 2018

I believe I might have fixed this issue in #152. I'll add a test to try out a "dryRun": True query before closing this issue.

@tswast
Copy link
Collaborator

tswast commented Mar 22, 2018

Unfortunately even with #152, still can't do dry run queries because it raises when google-cloud-bigquery tries to fetch the results.

Test code (query from analyzing PyPI downloads):

    def test_configuration_with_dryrun(self):
        query = """SELECT COUNT(*) AS num_downloads
FROM `the-psf.pypi.downloads*`
WHERE file.project = 'pandas-gbq'
  -- Only query the last 30 days of history
  AND _TABLE_SUFFIX
    BETWEEN FORMAT_DATE(
      '%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
    AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
"""
        config = {
            'dryRun': True
        }
        df = gbq.read_gbq(query, project_id=_get_project_id(),
                          private_key=self.credentials,
                          dialect='standard',
                          configuration=config)
        assert df is None

Exception:

pandas_gbq/tests/test_gbq.py:786:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas_gbq/gbq.py:812: in read_gbq
    schema, rows = connector.run_query(query, **kwargs)
pandas_gbq/gbq.py:534: in run_query
    self.process_http_error(ex)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

ex = NotFound('GET https://www.googleapis.com/bigquery/v2/projects/swast-scratch/queries/5ce25ce1-444d-4353-b69e-8d096b392955?maxResults=0: Not found: Job swast-scratch:5ce25ce1-444d-4353-b69e-8d096b392955',)

    @staticmethod
    def process_http_error(ex):
        # See `BigQuery Troubleshooting Errors
        # <https://cloud.google.com/bigquery/troubleshooting-errors>`__

>       raise GenericGBQException("Reason: {0}".format(ex))
E       pandas_gbq.gbq.GenericGBQException: Reason: 404 GET https://www.googleapis.com/bigquery/v2/projects/swast-scratch/queries/5ce25ce1-444d-4353-b69e-8d096b392955?maxResults=0: Not found: Job swast-scratch:5ce25ce1-444d-4353-b69e-8d096b392955

pandas_gbq/gbq.py:450: GenericGBQException
------------------------------------------------------------------------------ Captured stdout call -------------------------------------------------------------------------------
Requesting query... ok.
Job ID: 5ce25ce1-444d-4353-b69e-8d096b392955
Query running...
Query done.
Processed: 5.7 GB Billed: 0.0 B
Standard price: $0.00 USD

Retrieving results...
=======================

I think this might be related to issue #45 and/or an issue upstream in google-cloud-bigquery.

@tswast
Copy link
Collaborator

tswast commented Mar 26, 2018

I've confirmed that dryRun queries do work upstream in googleapis/google-cloud-python#5119 . I think pandas-gbq will need to check for dry run queries and decide not to try to fetch the results.

@tswast tswast changed the title Make available general configuration in read_gbq job definition test read_gbq with dryRun in configuration parameter Feb 4, 2020
@tswast tswast removed their assignment Feb 6, 2020
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Jul 17, 2021
@ramicaza
Copy link

any progress on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepting pull requests api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

4 participants