Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Exports Read/Writing From S3 Buckets #1137

Merged
merged 24 commits into from
Oct 9, 2024

Conversation

1nv8rzim
Copy link
Contributor

@1nv8rzim 1nv8rzim commented Sep 19, 2024

Purpose

One of the things I have been working towards is a stateless deploy of yeti in order to deploy several replicas of the celery runner and api containers. The biggest blocker for this is that a shared volumes between several containers is an anti-pattern and not supported on the cluster I am deploying to.

This PR introduces the ability to replaces this shared volume with an S3 Complaint Bucket

Changes

  • When system.export_path is prefixed with s3:// it will attempt to use S3 as the storage mediums for exports
    • Example: Setting system.export_path to s3://bucket_name would use bucket_name for storage of export task results
  • Celery Runner can now upload results export tasks to an s3 compliant bucket
  • API is able to read results of export task from an s3 compliant bucket
  • Credentials and configuration for s3 buckets access should be injected by environment variables

@1nv8rzim
Copy link
Contributor Author

1nv8rzim commented Sep 19, 2024

This works; I am drafting a more generalized approach here so that this same thing can be extended past s3 in the future

@tomchop
Copy link
Collaborator

tomchop commented Sep 23, 2024

This is a great feature, thanks a lot!

This works; I am drafting a more generalized approach 1nv8rzim#2 so that this same thing can be extended past s3 in the future

That was going to be my main comment on this - it would be great if the core of this was provider agnostic. Please let us know once the generalized approach is ready for review!

Another thing that came to mind is documentation for this (especially the env variable part) Would you mind adding a snippet .md file to https://github.com/yeti-platform/yeti-platform.github.io/tree/main/content/docs describing how to do set this up? I'll make sure it ends up in the right directory structure so you don't need to mind any of the hugo stuff.

Thanks again!

@1nv8rzim
Copy link
Contributor Author

Another thing that came to mind is documentation for this (especially the env variable part) Would you mind adding a snippet .md file to yeti-platform/yeti-platform.github.io@main/content/docs describing how to do set this up? I'll make sure it ends up in the right directory structure so you don't need to mind any of the hugo stuff.

Would something like this work?
yeti-platform/yeti-platform.github.io#8

That was going to be my main comment on this - it would be great if the core of this was provider agnostic. Please let us know once the generalized approach is ready for review!

Will do! I have tested the I PR I linked locally, need to test in our full deploy though

@tomchop
Copy link
Collaborator

tomchop commented Sep 23, 2024

Yes, the doc PR looks great! I might swoop in and change some typos / the layout of the page but it looks good.

Let me know once both are ready to review! Thanks again :)

@1nv8rzim 1nv8rzim marked this pull request as ready for review September 25, 2024 10:34
@1nv8rzim
Copy link
Contributor Author

1nv8rzim commented Sep 25, 2024

@tomchop
Merged in the code for the Generic Storage Clients. Tested and confirmed everything works. Should be ready for a full review

@1nv8rzim
Copy link
Contributor Author

1nv8rzim commented Sep 25, 2024

Based on the order that ^^ and the current PR are merged; we should think about moving over the the new template directory to use the persistient storage clients.

Copy link
Collaborator

@tomchop tomchop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a systemic typo: Persist**i**ent (should be Persistent). But, maybe we could just rename this to "FileStorage"?

core/clients/persistient_storage/__init__.py Outdated Show resolved Hide resolved
core/clients/persistient_storage/classes/main.py Outdated Show resolved Hide resolved
core/schemas/task.py Outdated Show resolved Hide resolved
core/clients/persistient_storage/classes/main.py Outdated Show resolved Hide resolved
core/clients/persistient_storage/classes/main.py Outdated Show resolved Hide resolved
core/clients/persistient_storage/classes/s3.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@1nv8rzim
Copy link
Contributor Author

There's a systemic typo: Persist**i**ent (should be Persistent). But, maybe we could just rename this to "FileStorage"?

Fixed in 5a3ea15

@1nv8rzim 1nv8rzim requested a review from tomchop September 26, 2024 10:07
Copy link
Collaborator

@tomchop tomchop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for fixing the other comments. Some extra nits here, but otherwise I think we'll be good to merge this after we're done.

core/schemas/task.py Outdated Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
core/clients/file_storage/classes/local_storage.py Outdated Show resolved Hide resolved
core/clients/persistient_storage/classes/s3.py Outdated Show resolved Hide resolved
@tomchop tomchop added enhancement dependencies Pull requests that update a dependency file labels Sep 29, 2024
@tomchop
Copy link
Collaborator

tomchop commented Sep 30, 2024

Based on the order that ^^ and the current PR are merged; we should think about moving over the the new template directory to use the persistient storage clients.

Thinking back on this again - I think I'm gonna merge https://github.com/yeti-platform/yeti/pull/1141/files as it is, we need to think a bit more as to whether we want to be able to load templates from outside the infrastructure where Yeti lives given that templates can lead to RCE (the whole reason we're moving them out of the db into the filesystem)

@1nv8rzim
Copy link
Contributor Author

1nv8rzim commented Sep 30, 2024

Based on the order that ^^ and the current PR are merged; we should think about moving over the the new template directory to use the persistient storage clients.

Thinking back on this again - I think I'm gonna merge #1141 (files) as it is, we need to think a bit more as to whether we want to be able to load templates from outside the infrastructure where Yeti lives given that templates can lead to RCE (the whole reason we're moving them out of the db into the filesystem)

Right now we are going through the process internally of making the api and tasks containers to being stateless so that we can run them in replicasets. This is actually the core reason behind this contribution.

The move of templates from the database to filesystem breaks this statelessness. Having an option to make this change stateless is something we would be very interested in.

Likewise, what is the attack vector that specifically makes storage of templates in arangodb vulnerable to RCE?

As I understood the entire process of having user editable export templates allows RCE through the templates themselves which would be agnostic to wherever they are stored. This would just be an accepted risk because we would assume authenticated should not preform RCE.

Similarly to all of this, if we can trust the file system with storing these templates, shouldn't using a remote storage solution like s3 bucket be just as secure?

Didn't notice it before but the PR removes the ability to edit these templates from the UI. This would work for us since it pretty much makes the templates static and we can just add them directly in at build time and they can be handled as part of the deploy.

@1nv8rzim 1nv8rzim requested a review from tomchop September 30, 2024 12:22
@1nv8rzim 1nv8rzim changed the title Add Supports for Exports Read/Writing From S3 Buckets Add Support for Exports Read/Writing From S3 Buckets Sep 30, 2024
@tomchop
Copy link
Collaborator

tomchop commented Oct 2, 2024

Looks like some tests are failing (bad imports), can you please fix this and the ruff errors?

@tomchop tomchop added the noteworthy PRs that are noteworthy / introduce new features label Oct 4, 2024
Copy link
Collaborator

@tomchop tomchop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this!!

@tomchop tomchop merged commit dc41200 into yeti-platform:main Oct 9, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file enhancement noteworthy PRs that are noteworthy / introduce new features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants