Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Enhance customization of Trino connections when using Trino-based Offline Stores #3699

Merged

Conversation

boliri
Copy link
Contributor

@boliri boliri commented Jul 26, 2023

What this PR does / why we need it:

Right now, the Trino Offline Store implementation lacks support for authentication. Taking a look at the code it seems like it is supported via the TRINO_AUTH environment variable; however, environment variables can only hold string values. The issue here is that TRINO_AUTH is passed to trino.dbapi.connect as is (i.e. a string), but it must be a trino.Authentication instance. So in practice, there's no way to make authentication work.

Aside from that, the current implementation supports customizing Trino connections via feature_store.yaml file, or via environment variables as hinted above. Particularly, the set of supported parameters on both sides is as follows:

feature_store.yaml

  • host
  • port
  • catalog

Environment variables

  • TRINO_HOST (mirror of feature_store.yaml's host parameter)
  • TRINO_PORT (mirror of feature_store.yaml's port parameter)
  • TRINO_USER
  • TRINO_CATALOG (mirror of feature_store.yaml's catalog parameter)
  • TRINO_HTTP_SCHEME
  • TRINO_SOURCE
  • TRINO_EXTRA_CREDENTIAL
  • TRINO_AUTH (supported but not usable, as explained above)

These configurations should be doable through the feature_store.yaml file to be consistent with the set of parameters already supported. Hence this PR.

A word of warning on compatibility

If you are using some of the environment variables above to customize Trino connections, you must update your feature_store.yaml file to make sure your setup doesn't break.

An example on how to incorporate those environment variables into your feature_store.yaml:

offline_store:
    type: feast_trino.trino.TrinoOfflineStore
    host: $TRINO_HOST
    port: $TRINO_PORT
    catalog: $TRINO_CATALOG
    connector:
        type: memory
    user: $TRINO_USER
    source: $TRINO_SOURCE
    http-scheme: $TRINO_HTTP_SCHEME
    x-trino-extra-credential-header: $TRINO_EXTRA_CREDENTIAL

Which issue(s) this PR fixes:

  • Unify Trino connections' configuration in a single point (feature_store.yaml) to ease customization.
  • Add an extra configuration (ssl-verify) to specify whether Trino SSL certificates should be validated on feast's side or not. This parameter is the equivalent to the verify flag in popular Python HTTP libs like requests.
  • Ensure that the TrinoOfflineStore class satisfies the signature of abstract methods included in the OfflineStore abstract class.
  • Add support for multiple authentication mechanisms:
    • Kerberos
    • Basic Auth
    • JWT
    • OAuth2
    • Certificate
  • Update Trino Offline Store's docs to provide examples on how to customize these connections.

Fixes #

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: boliri
To complete the pull request process, please assign chhabrakadabra after the PR has been reviewed.
You can assign the PR to them by writing /assign @chhabrakadabra in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@boliri boliri force-pushed the feat/enhance-trino-connections-customization branch from da6c481 to 12dc3fd Compare July 26, 2023 14:20
@adchia adchia force-pushed the feat/enhance-trino-connections-customization branch from 12dc3fd to f3014d1 Compare September 7, 2023 05:15
@adchia adchia enabled auto-merge (squash) September 7, 2023 05:58
@adchia adchia disabled auto-merge September 7, 2023 06:18
@adchia adchia merged commit ed7535e into feast-dev:master Sep 7, 2023
adchia pushed a commit that referenced this pull request Sep 7, 2023
# [0.34.0](v0.33.0...v0.34.0) (2023-09-07)

### Bug Fixes

* Add NUMERIC to bq_to_feast type map ([#3719](#3719)) ([6474b4b](6474b4b))
* Fix python unit tests ([#3734](#3734)) ([e81684d](e81684d))
* Handle unknown postgres source types gracefully ([#3634](#3634)) ([d7041f4](d7041f4))
* Pin protobuf version to avoid seg fault on some machines ([028cc20](028cc20))
* Remove unwanted excessive splitting of gcs path, so expected gcs parquet paths are returned from BigQueryRetrievalJob.to_remote_storage() ([#3730](#3730)) ([f2c5988](f2c5988))
* Run store.plan() only when need it. ([#3708](#3708)) ([7bc7c47](7bc7c47))
* Saved datasets no longer break CLI registry-dump command ([#3717](#3717)) ([f28ccc2](f28ccc2))
* Update py3.8 ci requirements for cython 3.0 release ([#3735](#3735)) ([1695c13](1695c13))

### Features

* Enhance customization of Trino connections when using Trino-based Offline Stores ([#3699](#3699)) ([ed7535e](ed7535e))
* Implement gRPC server to ingest streaming features ([#3687](#3687)) ([a3fcd1f](a3fcd1f))
james-crabtree-sp pushed a commit to sailpoint/feast that referenced this pull request Sep 14, 2023
…ed Offline Stores (feast-dev#3699)

* feat: Enhance customization of Trino connections when using Trino-based Offline Stores

Signed-off-by: boliri <[email protected]>

* docs: Add new connection parameters to Trino Offline Store's reference

Signed-off-by: boliri <[email protected]>

---------

Signed-off-by: boliri <[email protected]>
james-crabtree-sp pushed a commit to sailpoint/feast that referenced this pull request Sep 14, 2023
# [0.34.0](feast-dev/feast@v0.33.0...v0.34.0) (2023-09-07)

### Bug Fixes

* Add NUMERIC to bq_to_feast type map ([feast-dev#3719](feast-dev#3719)) ([6474b4b](feast-dev@6474b4b))
* Fix python unit tests ([feast-dev#3734](feast-dev#3734)) ([e81684d](feast-dev@e81684d))
* Handle unknown postgres source types gracefully ([feast-dev#3634](feast-dev#3634)) ([d7041f4](feast-dev@d7041f4))
* Pin protobuf version to avoid seg fault on some machines ([028cc20](feast-dev@028cc20))
* Remove unwanted excessive splitting of gcs path, so expected gcs parquet paths are returned from BigQueryRetrievalJob.to_remote_storage() ([feast-dev#3730](feast-dev#3730)) ([f2c5988](feast-dev@f2c5988))
* Run store.plan() only when need it. ([feast-dev#3708](feast-dev#3708)) ([7bc7c47](feast-dev@7bc7c47))
* Saved datasets no longer break CLI registry-dump command ([feast-dev#3717](feast-dev#3717)) ([f28ccc2](feast-dev@f28ccc2))
* Update py3.8 ci requirements for cython 3.0 release ([feast-dev#3735](feast-dev#3735)) ([1695c13](feast-dev@1695c13))

### Features

* Enhance customization of Trino connections when using Trino-based Offline Stores ([feast-dev#3699](feast-dev#3699)) ([ed7535e](feast-dev@ed7535e))
* Implement gRPC server to ingest streaming features ([feast-dev#3687](feast-dev#3687)) ([a3fcd1f](feast-dev@a3fcd1f))
james-crabtree-sp pushed a commit to sailpoint/feast that referenced this pull request Sep 14, 2023
…ed Offline Stores (feast-dev#3699)

* feat: Enhance customization of Trino connections when using Trino-based Offline Stores

Signed-off-by: boliri <[email protected]>

* docs: Add new connection parameters to Trino Offline Store's reference

Signed-off-by: boliri <[email protected]>

---------

Signed-off-by: boliri <[email protected]>
james-crabtree-sp pushed a commit to sailpoint/feast that referenced this pull request Sep 14, 2023
# [0.34.0](feast-dev/feast@v0.33.0...v0.34.0) (2023-09-07)

### Bug Fixes

* Add NUMERIC to bq_to_feast type map ([feast-dev#3719](feast-dev#3719)) ([6474b4b](feast-dev@6474b4b))
* Fix python unit tests ([feast-dev#3734](feast-dev#3734)) ([e81684d](feast-dev@e81684d))
* Handle unknown postgres source types gracefully ([feast-dev#3634](feast-dev#3634)) ([d7041f4](feast-dev@d7041f4))
* Pin protobuf version to avoid seg fault on some machines ([028cc20](feast-dev@028cc20))
* Remove unwanted excessive splitting of gcs path, so expected gcs parquet paths are returned from BigQueryRetrievalJob.to_remote_storage() ([feast-dev#3730](feast-dev#3730)) ([f2c5988](feast-dev@f2c5988))
* Run store.plan() only when need it. ([feast-dev#3708](feast-dev#3708)) ([7bc7c47](feast-dev@7bc7c47))
* Saved datasets no longer break CLI registry-dump command ([feast-dev#3717](feast-dev#3717)) ([f28ccc2](feast-dev@f28ccc2))
* Update py3.8 ci requirements for cython 3.0 release ([feast-dev#3735](feast-dev#3735)) ([1695c13](feast-dev@1695c13))

### Features

* Enhance customization of Trino connections when using Trino-based Offline Stores ([feast-dev#3699](feast-dev#3699)) ([ed7535e](feast-dev@ed7535e))
* Implement gRPC server to ingest streaming features ([feast-dev#3687](feast-dev#3687)) ([a3fcd1f](feast-dev@a3fcd1f))
zseta pushed a commit to zseta/feast that referenced this pull request Feb 7, 2024
…ed Offline Stores (feast-dev#3699)

* feat: Enhance customization of Trino connections when using Trino-based Offline Stores

Signed-off-by: boliri <[email protected]>

* docs: Add new connection parameters to Trino Offline Store's reference

Signed-off-by: boliri <[email protected]>

---------

Signed-off-by: boliri <[email protected]>
Signed-off-by: Attila Toth <[email protected]>
zseta pushed a commit to zseta/feast that referenced this pull request Feb 7, 2024
# [0.34.0](feast-dev/feast@v0.33.0...v0.34.0) (2023-09-07)

### Bug Fixes

* Add NUMERIC to bq_to_feast type map ([feast-dev#3719](feast-dev#3719)) ([6474b4b](feast-dev@6474b4b))
* Fix python unit tests ([feast-dev#3734](feast-dev#3734)) ([e81684d](feast-dev@e81684d))
* Handle unknown postgres source types gracefully ([feast-dev#3634](feast-dev#3634)) ([d7041f4](feast-dev@d7041f4))
* Pin protobuf version to avoid seg fault on some machines ([028cc20](feast-dev@028cc20))
* Remove unwanted excessive splitting of gcs path, so expected gcs parquet paths are returned from BigQueryRetrievalJob.to_remote_storage() ([feast-dev#3730](feast-dev#3730)) ([f2c5988](feast-dev@f2c5988))
* Run store.plan() only when need it. ([feast-dev#3708](feast-dev#3708)) ([7bc7c47](feast-dev@7bc7c47))
* Saved datasets no longer break CLI registry-dump command ([feast-dev#3717](feast-dev#3717)) ([f28ccc2](feast-dev@f28ccc2))
* Update py3.8 ci requirements for cython 3.0 release ([feast-dev#3735](feast-dev#3735)) ([1695c13](feast-dev@1695c13))

### Features

* Enhance customization of Trino connections when using Trino-based Offline Stores ([feast-dev#3699](feast-dev#3699)) ([ed7535e](feast-dev@ed7535e))
* Implement gRPC server to ingest streaming features ([feast-dev#3687](feast-dev#3687)) ([a3fcd1f](feast-dev@a3fcd1f))

Signed-off-by: Attila Toth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants