How to make transfer data faster while using databricks-sql-go #235

calebeaires · 2024-07-11T20:30:34Z

I am using the driver to run a data migration. When dealing with a table of 24 million rows and 9 columns, the performance is excellent when I fetch 10 thousand rows. When I increase the fetch size to 100 thousand rows, the transfer speed is still good. However, when fetching 1 million rows or more, the data transfer becomes very slow. A quick test made me think that the driver tries to fetch all the data in a single batch. Is there a way to improve this process?

This is the connection string

"token:xxx@host:443$xxx-path?catalog=sample&database=big_table&useCloudFetch=true&maxRows=10000"

I've tried to use this two settings, data transfer is still slow to big tables.

useCloudFetch=true
maxRows=10000

In forums users suggest change spark.driver.maxResultSize ?

kravets-levko · 2024-07-12T07:45:15Z

@calebeaires considering you enabled CloudFetch - most likely your assumption is correct, the driver actually tries to download all the data in memory. That's what we discovered recently, and are trying to fix right now - #234 Try to disable CloudFetch, this helped some other users. And then you can play with maxRows to see if it has any effect in your case

calebeaires · 2024-07-16T18:34:56Z

@kravets-levko I've changed the settings to useCloudFetch=false, still getting errors. Sometimes this erros come out:

execution error: failed to execute query: unexpected operation state ERROR_STATE: Total size of serialized results of 382 tasks (4.0 GiB) is bigger than spark.driver.maxResultSize 4.0 GiB. Code: 108

Topic: Spark Driver maxResultSize exceed 4.0 GiB

Hope we find a solution!

kravets-levko · 2024-07-16T19:15:19Z

@calebeaires for this particular error please reach out an administrator of your workspace, or Databricks support. This one is related to your workspace configuration, and that's not something that can be handled in library

calebeaires · 2024-07-18T20:27:20Z

Thank you so much. Please, help me elaborate the issue to the administrator. Is there some config the admin can define concerning max memory result size?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to make transfer data faster while using databricks-sql-go #235

How to make transfer data faster while using databricks-sql-go #235

calebeaires commented Jul 11, 2024 •

edited

Loading

kravets-levko commented Jul 12, 2024 •

edited

Loading

calebeaires commented Jul 16, 2024 •

edited

Loading

kravets-levko commented Jul 16, 2024

calebeaires commented Jul 18, 2024

How to make transfer data faster while using databricks-sql-go #235

How to make transfer data faster while using databricks-sql-go #235

Comments

calebeaires commented Jul 11, 2024 • edited Loading

kravets-levko commented Jul 12, 2024 • edited Loading

calebeaires commented Jul 16, 2024 • edited Loading

kravets-levko commented Jul 16, 2024

calebeaires commented Jul 18, 2024

calebeaires commented Jul 11, 2024 •

edited

Loading

kravets-levko commented Jul 12, 2024 •

edited

Loading

calebeaires commented Jul 16, 2024 •

edited

Loading