Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Stream SQLite writes #24

Merged

Conversation

peasee
Copy link
Collaborator

@peasee peasee commented Aug 1, 2024

🗣 Description

  • Converts the SQLite writer to stream writes instead of collecting all of the data to write in memory and then writing it

Through some manual testing, this improves both write performance (by 20-40%) and memory usage (by 80+%) during write operations with 60 million row tables.

@peasee peasee closed this Aug 1, 2024
@peasee peasee reopened this Aug 1, 2024
for batch in data_batches {
if batch.num_rows() > 0 {
sqlite.insert_batch(&transaction, batch, on_conflict.as_ref())?;
while let Some(data_batch) = batch_rx.blocking_recv() {
Copy link
Collaborator

@sgrebnov sgrebnov Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peasee - can we perform a read query while streaming results (refreshing)? I have a concern that we can lock table for quite some time

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can perform read queries for tables except the ones still accelerating. If you try querying a table that's accelerating, the query will hang until the acceleration finishes before returning values.

This is the existing behavior on main.

@sgrebnov
Copy link
Collaborator

sgrebnov commented Aug 1, 2024 via email

@phillipleblanc phillipleblanc merged commit 4356be6 into datafusion-contrib:main Aug 1, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants