schemachange: speed up slow schema changes #48608

spaskob · 2020-05-08T21:12:08Z

Touches #45150.
Fixes #47607.
Touches #47790.

Release note (performance improvement):
Before this a simple schema change could take 30s+.
The reason was that if the schema change is not first
in line in the table mutation queue it would return a
re-triable error and the jobs framework will re-adopt and
run it later. The problem is that the job adoption loop
is 30s.

To repro run this for some time:

cockroach sql --insecure --watch 1s -e 'drop table if exists users cascade; create table users (id uuid not null, name varchar(255) not null, email varchar(255) not null, password varchar(255) not null, remember_token varchar(100) null, created_at timestamp(0) without time zone null, updated_at timestamp(0) without time zone null, deleted_at timestamp(0) without time zone null); alter table users add primary key (id); alter table users add constraint users_email_unique unique (email);'

Instead of returning on re-triable errors we retry with exponential
backoff in the schema change code. This pattern of dealing with
re-triable errors in client job code is encouraged vs relying on the
registry because the latter leads to slowness and additionally to more
complicated test fixtures that rely on hacking with the internals of the
job registry,

cockroach-teamcity · 2020-05-08T21:12:16Z

This change is

ajwerner · 2020-05-08T21:17:00Z

pkg/sql/schema_changer.go

+			scErr = sc.exec(ctx)
+			if scErr == nil {
+				return nil
+			}
 			switch {


probably cleaner as:

switch scErr := sc.exec(ctx); scErr { case nil: return nil ...

ajwerner · 2020-05-08T21:17:03Z

pkg/sql/schema_changer.go

 			}
 		}
-		return nil
+		return jobs.NewRetryJobError(scErr.Error())


If you're here it probably means that your context was canceled. It's reasonably like that scErr is nil here which means this will panic.

well if scErr was nil, we would return inside the body of the loop

ajwerner · 2020-05-08T21:17:21Z

pkg/sql/schema_changer.go

+			MaxBackoff:     20 * time.Second,
+			Multiplier:     1.5,
+		}
+		var scErr error


I'm not sure it makes sense to retain this across iterations of the loop.

no but we need it after we exit the loop to return to registry the last error from the schema change

Makes sense.

blathers-crl · 2020-05-08T22:20:19Z

❌ The GitHub CI (Cockroach) build has failed on e281a348.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

Touches cockroachdb#47790. Release note (performance improvement): Before this a simple schema change could take 30s+. The reason was that if the schema change is not first in line in the table mutation queue it would return a re-triable error and the jobs framework will re-adopt and run it later. The problem is that the job adoption loop is 30s. To repro run this for some time: ``` cockroach sql --insecure --watch 1s -e 'drop table if exists users cascade; create table users (id uuid not null, name varchar(255) not null, email varchar(255) not null, password varchar(255) not null, remember_token varchar(100) null, created_at timestamp(0) without time zone null, updated_at timestamp(0) without time zone null, deleted_at timestamp(0) without time zone null); alter table users add primary key (id); alter table users add constraint users_email_unique unique (email);' ``` Instead of returning on retriable errors we retry with a exponential backoff in the schema change code. This pattern of dealing with retriable errors in client job code is encouraged vs relying on the registry beacuse the latter leads to slowness and additionally to more complicated test fixtures that rely in hacking with the internals of the job registry,

ajwerner

LGTM

ajwerner · 2020-05-08T22:37:03Z

pkg/sql/schema_changer.go

+			MaxBackoff:     20 * time.Second,
+			Multiplier:     1.5,
+		}
+		var scErr error


Makes sense.

spaskob · 2020-05-08T23:29:55Z

bors r+

craig · 2020-05-08T23:47:45Z

Build failed (retrying...)

GitHub CI (Cockroach)

spaskob · 2020-05-09T00:19:44Z

bors r+

craig · 2020-05-09T00:19:45Z

Already running a review

craig · 2020-05-09T01:31:33Z

Build succeeded

GitHub CI (Cockroach)

spaskob requested review from ajwerner and thoszhang May 8, 2020 21:12

ajwerner reviewed May 8, 2020

View reviewed changes

spaskob force-pushed the sc-slow branch 2 times, most recently from e14652c to e281a34 Compare May 8, 2020 21:51

spaskob force-pushed the sc-slow branch from e281a34 to 48a4766 Compare May 8, 2020 22:35

ajwerner approved these changes May 8, 2020

View reviewed changes

pkg/sql/schema_changer.go

MaxBackoff: 20 * time.Second,

Multiplier: 1.5,

}

var scErr error

Copy link

Contributor

ajwerner May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

spaskob mentioned this pull request May 8, 2020

sql: schema changes can be very slow #47790

Closed

5 tasks

craig bot merged commit 07ef639 into cockroachdb:master May 9, 2020

This was referenced May 9, 2020

sql: (temporary) Hang when dropping unique index created after ALTER PRIMARY KEY #45150

Closed

release-20.1: schemachange: speed up slow schema changes #48621

Merged

rafiss mentioned this pull request May 19, 2020

Run test cases separately in CI cockroachdb/activerecord-cockroachdb-adapter#129

Closed

rafiss mentioned this pull request Oct 21, 2021

sql: SQL schema changes are too slow for ORM tests to adopt #71800

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

schemachange: speed up slow schema changes #48608

schemachange: speed up slow schema changes #48608

spaskob commented May 8, 2020 •

edited

Loading

cockroach-teamcity commented May 8, 2020

ajwerner May 8, 2020

spaskob May 8, 2020

ajwerner May 8, 2020

spaskob May 8, 2020

ajwerner May 8, 2020

spaskob May 8, 2020

ajwerner May 8, 2020

blathers-crl bot commented May 8, 2020

ajwerner left a comment

ajwerner May 8, 2020

spaskob commented May 8, 2020

craig bot commented May 8, 2020

spaskob commented May 9, 2020

craig bot commented May 9, 2020

craig bot commented May 9, 2020

schemachange: speed up slow schema changes #48608

schemachange: speed up slow schema changes #48608

Conversation

spaskob commented May 8, 2020 • edited Loading

cockroach-teamcity commented May 8, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blathers-crl bot commented May 8, 2020

ajwerner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spaskob commented May 8, 2020

craig bot commented May 8, 2020

Build failed (retrying...)

spaskob commented May 9, 2020

craig bot commented May 9, 2020

craig bot commented May 9, 2020

Build succeeded

spaskob commented May 8, 2020 •

edited

Loading