Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear intermediate data during pipeline execution #1264

Open
brollb opened this issue Oct 16, 2019 · 3 comments
Open

Clear intermediate data during pipeline execution #1264

brollb opened this issue Oct 16, 2019 · 3 comments

Comments

@brollb
Copy link
Contributor

brollb commented Oct 16, 2019

When executing a pipeline, it would be nice to opt out of storing intermediate data. This is especially the case when we are specifying the storage backend to use as the amount of space may be limited.

This will have implications for the ability to restart individual jobs in a pipeline.

@brollb
Copy link
Contributor Author

brollb commented Oct 23, 2019

It might be easier to just clear the pipeline data after a successful execution. Essentially, this should find all jobs that are not "Input" nodes then delete all associated data nodes.

@brollb
Copy link
Contributor Author

brollb commented Nov 14, 2019

This is a bit more involved:

  • non-debug pipelines should probably have their data deleted on completion. There is probably no need for storing intermediate results as users are not able to edit and re-run portions of the pipeline.
  • debug pipelines should probably only delete their intermediate data on:
    • restart
    • deletion
      Additionally, data should probably be deleted when the associated artifact is deleted. However, this raises another question about how to manage tokens/backend authentication as we certainly don't want to prompt the user for storage config info on each deletion.

A better way to handle storage configuration could be through introducing a new concept (maybe "integrations"?) which stores the config info for the associated components (storage, compute, etc) and is stored for the given user.

@brollb
Copy link
Contributor Author

brollb commented Apr 10, 2020

On a related (but probably shouldn't be part of the same issue) note, it would be worth considering using temporary/scratch storage when supported by the storage adapter (such as "Temporary" for SciServer Files)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant