Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running a parameterized query via a SQLAlchemy Selectable #478

Open
rjrudin opened this issue Jan 25, 2022 · 4 comments
Open

Running a parameterized query via a SQLAlchemy Selectable #478

rjrudin opened this issue Jan 25, 2022 · 4 comments
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@rjrudin
Copy link

rjrudin commented Jan 25, 2022

Is your feature request related to a problem? Please describe.

I'm back again, after submitting #474 and #476, as we're trying to shift from using pd.read_sql_query to using pandas_gbq.read_gbq .

Current issue is that when we construct our SQLAlchemy Selectable, we may toss a row limit onto it based on user input, e.g:

sql: Select = select(literal_column("*")).select_from(...)
sql = sql.limit(50)

pands_gbq.read_gbq wants a str as its query argument. So we're trying to compile the Selectable and serialize it as a string:

query = str(sql.compile(self.engine))

However, SQLAlchemy serializes its named parameters in a way that I can't get to work with pandas_gbq (and more generally, that won't work with BigQuery):

SELECT * 
FROM `some_dataset.some_table`
LIMIT %(param_1:INT64)s

I found an example in the pandas_gbq tests of how to configure query/queryParameters in the "configuration" dict that can be passed to read_gbq, but that of course doesn't help because BigQuery has its own way of naming parameters.

Describe the solution you'd like

Generally, I need a way (hopefully it exists already and I'm just missing it!) of serializing a SQLAlchemy Selectable into a string query that can be the input to read_gbq. For now, just being able to handle the limit() parameter above would suffice. I of course am trying to avoid string concatenation and am hoping there's some way I can configure the "configuration" dict to make this work.

Describe alternatives you've considered

N/A

Additional context

N/A

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Jan 25, 2022
@rjrudin
Copy link
Author

rjrudin commented Jan 26, 2022

Not sure if this is relevant - using SQLAlchemy 1.4.25 and sqlalchemy-bigquery 1.3.0, when I run this:

from sqlalchemy_bigquery import BigQueryDialect
str(sql.compile(dialect=BigQueryDialect))

I get this error:

AttributeError: type object 'BigQueryDialect' has no attribute 'positional'

I tried using "setattr" to fake-fix that, but I get several other errors about attributes missing. Not sure if that would work, and so I've stuck with just doing compile(self.engine) or compile(bind=self.engine).

@rjrudin
Copy link
Author

rjrudin commented Jan 26, 2022

If this is really an issue for https://github.com/googleapis/python-bigquery-sqlalchemy , I'm happy to move it there too.

@yoshi-automation yoshi-automation added triage me I really want to be triaged. 🚨 This issue needs some love. labels Jan 26, 2022
@meredithslota meredithslota added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Feb 8, 2022
@tswast
Copy link
Collaborator

tswast commented Feb 17, 2022

This is a tough one. I think it does belong here, as the SQLAlchemy connector needs to continue to generate SQL that can be understood by the google.cloud.bigquery.dbapi modules.

I would like to allow this library to take SQLAlchemy-generated SQL somehow, possibly via an optional dependency on SQLAlchemy so pandas-gbq can run the compile itself.

@willsthompson
Copy link

I assume it's built this way because DBAPI doesn't include a BigQuery-compatible parameter substitution style, but the sqlalchemy-bigquery dialect compiles SQL into the pyformat style, then at runtime transforms the pyformatted query into a BigQuery compatible one and builds params that can be executed against a bigquery.Client. Pretty confusing, but I pulled out the relevant pieces into a function that does the conversion

from google.cloud.bigquery.dbapi._helpers import to_query_parameters
from google.cloud.bigquery.dbapi.cursor import _format_operation

query = select(...) # SQLAlchemy query
compiled = query.compile(engine)  # engine using BigQueryDialect or dialect=BigQueryDialect
parameters = compiled.params
bq_query, parameter_types = _format_operation(str(compiled), parameters)
job_config = bigquery.QueryJobConfig(
    query_parameters=to_query_parameters(parameters, parameter_types)
)
# client = bigquery.Client(...)
query_job = client.query(bq_query, job_config)

This is pretty boiled down and doesn't address any edge cases, so YMMV, but it works for the relatively small scope of queries we expect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

5 participants