Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC-3911: Deleter API #3911

Merged
merged 10 commits into from
Jan 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 172 additions & 0 deletions core/src/docs/rfcs/3911_deleter_api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
- Proposal Name: `deleter_api`
- Start Date: 2024-01-04
- RFC PR: [apache/incubator-opendal#3911](https://github.com/apache/incubator-opendal/pull/3911)
- Tracking Issue: [apache/incubator-opendal#3922](https://github.com/apache/incubator-opendal/issues/3922)

# Summary

Introduce the `Deleter` API to enhance batch and recursive deletion capabilities.

# Motivation

All OpenDAL's public API follow the same design:

- `read`: Execute a read operation.
- `read_with`: Execute a read operation with additional options, like range and if_match.
- `reader`: Create a reader for streaming data, enabling flexible access.
- `reader_with`: Create a reader with advanced options.

However, `delete` operations vary. OpenDAL offers several methods for file deletion:

- `delete`: Delete a single file or an empty dir.
- `remove`: Remove a list of files.
- `remove_via`: Remove files produced by a stream.
- `remove_all`: Remove all files under a path.

This design is not consistent with the other APIs, and it is not easy to use.

So I propose `Deleter` to address them all at once.

# Guide-level explanation

The following new API will be added to `Operator`:

```diff
impl Operator {
pub async fn detele(&self, path: &str) -> Result<()>;
+ pub async fn delete_with(&self, path: &str) -> FutureDelete;

+ pub async fn deleter(&self) -> Result<Deleter>;
+ pub async fn deleter_with(&self) -> FutureDeleter;
}
```

- `delete` is the existing API, which deletes a single file or an empty dir.
- `delete_with` is an extension of the existing `delete` API, which supports additional options, such as `recursive`.
- `deleter` is a new API that returns a `Deleter` instance.
- `deleter_with` is an extension of the existing `deleter` API, which supports additional options, such as `recursive`.

The following new options will be available for `delete_with` and `deleter_with`:

- `recursive`: Enable recursive deletion.
- `concurrent`: How many delete tasks can be performed concurrently?
- `buffer`: How many files can be buffered for send in a single batch?

Users can delete a file recursively in this way:

```rust
let _ = op.delete_with("path/to/file").recursive(true).await?;
```

Users can delete multiple files in this way:


```rust
let deleter = op.deleter().await?;

// Add a single file to the deleter.
deleter.delete(path).await?;

// Add a stream of files to the deleter.
deleter.delete_all(&mut lister).await?;

// Close deleter, make sure all input files are deleted.
deleter.close().await?;
```

`Deleter` also implements [`Sink`](https://docs.rs/futures/latest/futures/sink/trait.Sink.html), so all the methods of `Sink` are available for `Deleter`. For example, users can use [`forward`](https://docs.rs/futures/latest/futures/stream/trait.StreamExt.html#method.forward) to forward a stream of files to `Deleter`:

```rust
// Init a deleter to start batch delete tasks.
let deleter = op.deleter().await?;
// List all files that ends with tmp
let lister = op.lister(path).await?
.filter(|x|future::ready(x.ends_with(".tmp")));

// Forward all paths into deleter.
lister.forward(deleter).await?;

// Send all from a stream into deleter.
deleter.send_all(&mut lister).await?;

// Close the deleter.
deleter.close().await?;
```

Users can control the behavior of `Deleter` by setting the options:

```rust
let deleter = op.deleter()
// Allow up to 8 concurrent delete tasks, default to 1.
.concurrent(8)
// Configure the buffer size to 1000, default value provided by services.
.buffer(1000)
.await?;

// Add a single file to the deleter.
deleter.delete(path).await?;

// Add a stream of files to the deleter.
deleter.delete_all(&mut lister).await?;
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved

// Close deleter, make sure all input files are deleted.
deleter.close().await?;
```

In response to `Deleter` API, we will remove APIs like `remove`, `remove_via` and `remove_all`.

- `remove` and `remove_via` could be replaced by `Deleter` directly.
- `remove_all` could be replaced by `delete_with(path).recursive(true)`.

# Reference-level explanation

To provide those public APIs, we will add a new associated type in `Accessor`:

```rust
trait Accessor {
...

type Deleter = oio::Delete;
type BlockingDeleter = oio::BlockingDelete;
}
```

And the `delete` API will be changed to return a `oio::Delete` instead:

```diff
trait Accessor {
- async fn delete(&self, path: &str, args: OpDelete) -> Result<RpDelete>;
+ async fn delete(&self, path: &str, args: OpDelete) -> Result<(RpDelete, Self::Deleter)>;
oowl marked this conversation as resolved.
Show resolved Hide resolved
}
```

Along with this change, we will remove the `batch` API from `Accessor`:

```rust
trait Accessor {
- async fn batch(&self, args: OpBatch) -> Result<RpBatch>;
}
```

# Drawbacks

- Big breaking changes.


# Rationale and alternatives

None.

# Prior art

None.

# Unresolved questions

None.

# Future possibilities

## Add API that accepts `IntoIterator`

It's possible to add a new API that accepts `IntoIterator` so users can input `Vec<String>` or `Iter<String>` into `Deleter`.
6 changes: 6 additions & 0 deletions core/src/docs/rfcs/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -217,8 +217,14 @@ pub mod rfc_3526_list_recursive {}
#[doc = include_str!("3574_concurrent_stat_in_list.md")]
pub mod rfc_3574_concurrent_stat_in_list {}

/// Buffered Reader
#[doc = include_str!("3734_buffered_reader.md")]
pub mod rfc_3734_buffered_reader {}

/// Concurrent Writer
#[doc = include_str!("3898_concurrent_writer.md")]
pub mod rfc_3898_concurrent_writer {}

/// Deleter API
#[doc = include_str!("3911_deleter_api.md")]
pub mod rfc_3911_deleter_api {}
Loading