Clear intermediate data during pipeline execution #1264

brollb · 2019-10-16T14:52:58Z

When executing a pipeline, it would be nice to opt out of storing intermediate data. This is especially the case when we are specifying the storage backend to use as the amount of space may be limited.

This will have implications for the ability to restart individual jobs in a pipeline.

brollb · 2019-10-23T13:27:06Z

It might be easier to just clear the pipeline data after a successful execution. Essentially, this should find all jobs that are not "Input" nodes then delete all associated data nodes.

brollb · 2019-11-14T23:17:55Z

This is a bit more involved:

non-debug pipelines should probably have their data deleted on completion. There is probably no need for storing intermediate results as users are not able to edit and re-run portions of the pipeline.
debug pipelines should probably only delete their intermediate data on:
- restart
- deletion
  Additionally, data should probably be deleted when the associated artifact is deleted. However, this raises another question about how to manage tokens/backend authentication as we certainly don't want to prompt the user for storage config info on each deletion.

A better way to handle storage configuration could be through introducing a new concept (maybe "integrations"?) which stores the config info for the associated components (storage, compute, etc) and is stored for the given user.

brollb · 2020-04-10T22:23:27Z

On a related (but probably shouldn't be part of the same issue) note, it would be worth considering using temporary/scratch storage when supported by the storage adapter (such as "Temporary" for SciServer Files)...

brollb mentioned this issue Oct 23, 2019

Add deletion to storage backends #1292

Closed

brollb added a commit that referenced this issue Nov 14, 2019

WIP #1264 Clear intermediate data on (non-debug) pipeline complete

fbfcbca

brollb added enhancement storage labels Nov 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clear intermediate data during pipeline execution #1264

Clear intermediate data during pipeline execution #1264

brollb commented Oct 16, 2019

brollb commented Oct 23, 2019

brollb commented Nov 14, 2019

brollb commented Apr 10, 2020

Clear intermediate data during pipeline execution #1264

Clear intermediate data during pipeline execution #1264

Comments

brollb commented Oct 16, 2019

brollb commented Oct 23, 2019

brollb commented Nov 14, 2019

brollb commented Apr 10, 2020