Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Parameters to SqlToS3Operator template fields. #45466

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tkeller-moxe
Copy link

@tkeller-moxe tkeller-moxe commented Jan 7, 2025

Adds parameters to the template_fields for the SqlToS3Operator.

Team was writing some basic automations and we came across some use cases where we want to dynamically run some reports with parameters and the thought of direct replacement inside the SQL instead of using SQL parameters is irking.

An example of what we currently need todo.

SqlToS3Operator(
        task_id="export_sql_to_s3",
        sql_conn_id="trino_chart_retrieval_ro_id",
        query="""select * from foo where date > {{ params.date}}""",
        s3_bucket="{{params.s3bucket}}",
        s3_key="{{params.filepath }}" + "/" + "{{params.filename}}",
        pd_kwargs={"index": False, "quoting": csv.QUOTE_ALL},
        file_format="csv",
        aws_conn_id="aws_conn_id",
        replace=True,
    )

vs with the parameters field being templated.

SqlToS3Operator(
        task_id="export_sql_to_s3",
        sql_conn_id="trino_chart_retrieval_ro_id",
        query="""select * from foo where date > %s""",
        parameters=({{params.date}},),
        s3_bucket="{{params.s3bucket}}",
        s3_key="{{params.filepath }}" + "/" + "{{params.filename}}",
        pd_kwargs={"index": False, "quoting": csv.QUOTE_ALL},
        file_format="csv",
        aws_conn_id="aws_conn_id",
        replace=True,
    )

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Copy link

boring-cyborg bot commented Jan 7, 2025

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

@eladkal
Copy link
Contributor

eladkal commented Jan 7, 2025

I think your example doesn't make sense. You are not using parameters anywhere.
I don't think parameters needs to be templated. Correct me if i am wrong here but you are confusing params with parameters:

https://stackoverflow.com/a/72246305/14624409

It doesn't make sense to template parameters given how it's being used.
If I miss something please explain

Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking merge till we clarify the issue

@tkeller-moxe
Copy link
Author

tkeller-moxe commented Jan 7, 2025

I made some edits for clarity, Thanks for the quick response!


Edit let me edit/read your stackoverflow a bit more.

@tkeller-moxe
Copy link
Author

tkeller-moxe commented Jan 7, 2025

So I may be confused a little as well.

Are the DAG Params when rendered in SQL, eg my first example, properly SQL escaped? If so that does alleviate our concern. The team is worried about a future state where SQL injection might be possible. My assumption is that they are not and its mostly just a string insertion.

Part of what might be confusing me is the SQLExecuteQueryOperator does have parameters in the template_fields which is leading to some inconsistency.

The second example I wrote is using the parameters option to allow SQLAlchemy to render the select * from foo where date > %s as a parameterized SQL statement passing in the values of {{ params.date }} as an argument instead of directly inline with the query.

@eladkal
Copy link
Contributor

eladkal commented Jan 7, 2025

Part of what might be confusing me is the SQLExecuteQueryOperator does have parameters in the template_fields which is leading to some inconsistency.

To my perspective it's wrong but that ship has saild.

I don't really mind adding it as template filed here as you requested but the example you shared doesn't reflect using parameters at all and if we do add it as template fields it will confuse others even more. I think what you really need is:

SqlToS3Operator(
        ...
        query="""select * from foo where date > {{ params.date }} """,
        params={"date": my_date}
    )

templated fields is going to have massive refactor in Airflow 3

@tkeller-moxe
Copy link
Author

tkeller-moxe commented Jan 7, 2025

I'll be honest I'm even more confused now.
My goal is to be able to use a DAG/UI enterable parameter to drive changes in this SQL statement.

The task level params do not show up when you go to manually trigger a DAG.

Not included in the example, which might be the issue, is the DAG definition which looks like such
This is an example of the end state with parameters.

with DAG(
    ...
    dag_id="demo",
    schedule=None,
    catchup=False,
    params={
        ...
        "date": "2024-01-01",
    },
) as dag:
  SqlToS3Operator(
          task_id="export_sql_to_s3",
          sql_conn_id="trino_chart_retrieval_ro_id",
          query="""select * from foo where date > %s""",
          parameters=({{params.date}},),
          ...
      )

I gave a try using the task level params, as you gave an example, sadly I could not get them to show up in the Trigger Dag UI. Is there a pattern or documentation I'm missing here?

@konradish
Copy link

Part of what might be confusing me is the SQLExecuteQueryOperator does have parameters in the template_fields which is leading to some inconsistency.

To my perspective it's wrong but that ship has saild.

I don't really mind adding it as template filed here as you requested but the example you shared doesn't reflect using parameters at all and if we do add it as template fields it will confuse others even more. I think what you really need is:

SqlToS3Operator(
        ...
        query="""select * from foo where date > {{ params.date }} """,
        params={"date": my_date}
    )

templated fields is going to have massive refactor in Airflow 3

Isn't that templated string vulnerable to SQL injection? Passing params will go through SQLAlchemy and avoid that vulnerability. Is my understanding correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants