Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: show progress while fetching rows for query #182

Closed
tswast opened this issue May 23, 2018 · 7 comments · Fixed by #292
Closed

ENH: show progress while fetching rows for query #182

tswast opened this issue May 23, 2018 · 7 comments · Fixed by #292
Labels
accepting pull requests type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@tswast
Copy link
Collaborator

tswast commented May 23, 2018

As reported by @QuinRiva in #12 (comment)

Progress is written to logging while a query is running, but no progress is reported between when a query finishes and while the data is being downloaded to be added to a DataFrame. The problem is that we call list(rows_iter) to fetch all pages. Previously, progress was written as each page was downloaded.

https://github.com/pydata/pandas-gbq/blob/08166685d3305a57fbfd3bc4c41a1cf5df98ebcf/pandas_gbq/gbq.py#L294-L299

A possible solution is to loop over rows_iter.pages instead. After the first page is fetched, the rows_iter.total_rows property is available, so it would be possible to display a percent complete or even use tqdm as done in #166.

@tswast tswast added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. accepting pull requests labels May 23, 2018
@max-sixty
Copy link
Contributor

I found tqdm to be really nice; I would vote for that depending on whether we think it's OK to add that as a dependency.

A possible solution is to loop over rows_iter.pages instead.

Yes, and that gives us a bunch of potential performance improvements too!

@JohnPaton
Copy link
Contributor

JohnPaton commented Mar 18, 2019

As a frequent downloader of large tables over a slow connection I'd like to implement this. Questions for @tswast @max-sixty :

  • Are you guys okay with adding tqdm as a dependency?
    • If yes: Other logging currently goes via logging, are you okay with having tqdm write to stdout/stderr (the default)? Else it will require one of the solutions from here
    • If no: What format should the row download logging have?

@tswast
Copy link
Collaborator Author

tswast commented Mar 18, 2019

I'd prefer to keep tqdm an optional dependency. (We already use it to show some progress during to_gbq.

But note now that we're using to_dataframe() from the google-cloud-bigquery library, I think adding this logic there would be most appropriate. I believe the RowIterator. _to_dataframe_tabledata_list would be the most appropriate place to use tqdm if it's installed.

tqdm write to stdout/stderr (the default)?

If tqdm is installed, I think following it's default behavior is the most appropriate thing to do.

@tswast
Copy link
Collaborator Author

tswast commented Mar 18, 2019

Oh, and I'm a maintainer of google-cloud-bigquery, too, so it shouldn't be any more scary to contribute over there. We're receptive of PRs to the Google Cloud client libraries, too. :-)

@JohnPaton
Copy link
Contributor

Now that googleapis/google-cloud-python#7552 is merged, should read_gbq get a progress_bar parameter to match to_gbq?

@tswast
Copy link
Collaborator Author

tswast commented Mar 29, 2019

Yeah, once I make a new release of google-cloud-bigquery (I expect to next week), we can start populating the progress bar argument. I'd be okay always populating it as 'tqdm', but it is nice to have the special version for notebooks, so an argument to read_gbq makes sense.

@dakl
Copy link
Contributor

dakl commented Oct 25, 2019

Now that to_dataframe in google-cloud-bigquery supports progress_bar_type, can we make use of that to add a progress bar to read_gbq? @tswast

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepting pull requests type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants