Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schemachange: speed up slow schema changes #48608

Merged
merged 1 commit into from
May 9, 2020
Merged

Conversation

spaskob
Copy link
Contributor

@spaskob spaskob commented May 8, 2020

Touches #45150.
Fixes #47607.
Touches #47790.

Release note (performance improvement):
Before this a simple schema change could take 30s+.
The reason was that if the schema change is not first
in line in the table mutation queue it would return a
re-triable error and the jobs framework will re-adopt and
run it later. The problem is that the job adoption loop
is 30s.

To repro run this for some time:

cockroach sql --insecure --watch 1s -e 'drop table if exists users cascade; create table users (id uuid not null, name varchar(255) not null, email varchar(255) not null, password varchar(255) not null, remember_token varchar(100) null, created_at timestamp(0) without time zone null, updated_at timestamp(0) without time zone null, deleted_at timestamp(0) without time zone null); alter table users add primary key (id); alter table users add constraint users_email_unique unique (email);'

Instead of returning on re-triable errors we retry with exponential
backoff in the schema change code. This pattern of dealing with
re-triable errors in client job code is encouraged vs relying on the
registry because the latter leads to slowness and additionally to more
complicated test fixtures that rely on hacking with the internals of the
job registry,

@spaskob spaskob requested review from ajwerner and thoszhang May 8, 2020 21:12
@cockroach-teamcity
Copy link
Member

This change is Reviewable

scErr = sc.exec(ctx)
if scErr == nil {
return nil
}
switch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably cleaner as:

switch scErr := sc.exec(ctx); scErr {
case nil:
    return nil
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}
return nil
return jobs.NewRetryJobError(scErr.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're here it probably means that your context was canceled. It's reasonably like that scErr is nil here which means this will panic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well if scErr was nil, we would return inside the body of the loop

MaxBackoff: 20 * time.Second,
Multiplier: 1.5,
}
var scErr error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it makes sense to retain this across iterations of the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no but we need it after we exit the loop to return to registry the last error from the schema change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

@spaskob spaskob force-pushed the sc-slow branch 2 times, most recently from e14652c to e281a34 Compare May 8, 2020 21:51
@blathers-crl
Copy link

blathers-crl bot commented May 8, 2020

❌ The GitHub CI (Cockroach) build has failed on e281a348.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Touches cockroachdb#47790.

Release note (performance improvement):
Before this a simple schema change could take 30s+.
The reason was that if the schema change is not first
in line in the table mutation queue it would return a
re-triable error and the jobs framework will re-adopt and
run it later. The problem is that the job adoption loop
is 30s.

To repro run this for some time:
```
cockroach sql --insecure --watch 1s -e 'drop table if exists users cascade; create table users (id uuid not null, name varchar(255) not null, email varchar(255) not null, password varchar(255) not null, remember_token varchar(100) null, created_at timestamp(0) without time zone null, updated_at timestamp(0) without time zone null, deleted_at timestamp(0) without time zone null); alter table users add primary key (id); alter table users add constraint users_email_unique unique (email);'
```

Instead of returning on retriable errors we retry with a exponential
backoff in the schema change code. This pattern of dealing with
retriable errors in client job code is encouraged vs relying on the
registry beacuse the latter leads to slowness and additionally to more
complicated test fixtures that rely in hacking with the internals of the
job registry,
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

MaxBackoff: 20 * time.Second,
Multiplier: 1.5,
}
var scErr error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

@spaskob
Copy link
Contributor Author

spaskob commented May 8, 2020

bors r+

@craig
Copy link
Contributor

craig bot commented May 8, 2020

Build failed (retrying...)

@spaskob
Copy link
Contributor Author

spaskob commented May 9, 2020

bors r+

@craig
Copy link
Contributor

craig bot commented May 9, 2020

Already running a review

@craig
Copy link
Contributor

craig bot commented May 9, 2020

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: add primary key on empty table very slow
3 participants