Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cli): support optimistic stabilization strategy(--exit-on-config-complete) #29536

Closed
wants to merge 17 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -444,6 +444,36 @@ class LambdaHotswapStack extends cdk.Stack {
}
}

class DetailedStatusStack extends cdk.Stack{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none of these resources currently cause the stack itself to emit the CONFIGURATION_COMPLETE event, so this test needs to be updated. The only resource I know of that does make it emit this event is ECS Services

constructor(parent, id, props) {
super(parent, id, props);

const vpc = new ec2.Vpc(this, 'Vpc', {
natGateways: 0,
maxAzs: 1,
subnetConfiguration: [
{
cidrMask: 24,
name: 'Public',
subnetType: ec2.SubnetType.PUBLIC,
},
],
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A high level question -- what if a resource fails deploying after CONFIGURATION_COMPLETE status is emitted? Do we handle that case? Is that a possible case? If it is a possible case, could we have an integration test for that?

Copy link
Contributor Author

@pahud pahud May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will discuss this with the CFN team. As this PR is to monitor CONFIGURATION_COMPLETE for the stack detailed status, not the resources. It could be a risk that stack enters CONFIGURATION_COMPLETE detailed status but its final status is failed. In this case CDK CLI would assume the stack deployment is completed but actually is failed. We need to check:

  1. Is this possible for CFN stacks to happen?
  2. If that happens, what is the best DX for CDK users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is, if that happens, we should fail with exit status 1. One idea, without much thinking about how this would work, is to poll the statuses of all the stacks once the final stack has entered CONFIG_COMPLETE. So then it's just that the final stack of the deployment is not getting the speedup... which I think is fine since you then get the speed up on all the intermediate stacks.

I am considering if there is a case where a customer kicks off some sort of production thing as soon as the customer get a success signal from CDK deploy? If so, then we'd want to make sure that everything did successfully provision... unless CloudFormation handles that, somehow. I'm not too familiar with this new optimistic stabilization feature from CloudFormation.

Copy link
Contributor Author

@pahud pahud May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should figure out, let's say if we have 2 stacks to deploy with Stack2 depends on Stack1:

Stack1 enters CC state -> CLI starts deploying Stack2 -> Stack2 enters CC state -> CLI exit 0 but Stack1 ended up with final status failed. Now we get boost from Stack1 and Stack2 but Stack1 is actually failed. What should we do? If we just check the final status of the last stack, it would be success but actually Stack1 has failed and probably would be rolling back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preconception (which could be wrong) is that stack 2 would fail if stack 1 eventually failed. But if the scenario you're describing is true, then that biases me in favor of including a warning about this not being for production, and that would be all we'd do for addressing the failure case. So, similar to how we advise users to not use hotswap in production, we'd advise users to not use this flag in production. What do you think of that? I personally feel satisfied with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check with the CFN team:

  • Will stack2 fail when stack1 fails with CC status
    a. if stack2 would fail then we just need to check the status of the last stack
    b. if stack2 would NOT fail we will end up with a failed stack1 and a deployed stack2.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I agree with @bergjaak to warn customer to not use the flag in production. This is what CFN official user guide mentions.

Stack 2 will not fail automatically. unless CDK implement a pulling mechanism that read the state an fail the stack based on the stack 1 state.

const instance = new ec2.CfnInstance(this, 'Instance', {
instanceType: 't3.nano',
imageId: new ec2.AmazonLinuxImage().getImage(this).imageId,
subnetId: vpc.publicSubnets[0].subnetId,
});
new ec2.Volume(this, 'Volume', {
availabilityZone: instance.attrAvailabilityZone,
removalPolicy: cdk.RemovalPolicy.DESTROY,
size: cdk.Size.gibibytes(1),
});

}
}

class EcsHotswapStack extends cdk.Stack {
constructor(parent, id, props) {
super(parent, id, props);
Expand Down Expand Up @@ -674,6 +704,7 @@ switch (stackSet) {
new LambdaStack(app, `${stackPrefix}-lambda`);
new LambdaHotswapStack(app, `${stackPrefix}-lambda-hotswap`);
new EcsHotswapStack(app, `${stackPrefix}-ecs-hotswap`);
new DetailedStatusStack(app, `${stackPrefix}-detailed-status`);
new DockerStack(app, `${stackPrefix}-docker`);
new DockerStackWithCustomFile(app, `${stackPrefix}-docker-with-custom-file`);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -488,6 +488,14 @@ integTest('deploy with notification ARN', withDefaultFixture(async (fixture) =>
}
}));

integTest('deploy with optimistic stabilization', withDefaultFixture(async (fixture) => {
const deployOutput = await fixture.cdkDeploy('detailed-status', {
options: ['--exit-on-config-complete', '--debug'],
});

expect(deployOutput).toContain('in CONFIGURATION_COMPLETE detailed status. Considering this in a stable status');
}));

// NOTE: this doesn't currently work with modern-style synthesis, as the bootstrap
// role by default will not have permission to iam:PassRole the created role.
integTest('deploy with role', withDefaultFixture(async (fixture) => {
Expand Down
21 changes: 21 additions & 0 deletions packages/aws-cdk/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -464,6 +464,27 @@ and might have breaking changes in the future.

> *: `Fn::GetAtt` is only partially supported. Refer to [this implementation](https://github.com/aws/aws-cdk/blob/main/packages/aws-cdk/lib/api/evaluate-cloudformation-template.ts#L477-L492) for supported resources and attributes.

#### Optimistic Stabilization

You can pass the `--exit-on-config-complete` flag to the `deploy` command:

```console
$ cdk deploy --exit-on-config-complete [StackNames]
```

This approach triggers a deployment and monitors the `CONFIGURATION_COMPLETE` detailed status from AWS CloudFormation.
It immediately exits with a successful status once the `CONFIGURATION_COMPLETE` status is detected. This can speed up
the stack deployment process by skipping the stack-level eventual consistency checks, which can be beneficial when you
have multiple stacks that depend on each other.

For more details on this, you can refer to the blog post at
[this link](https://aws.amazon.com/blogs/devops/how-we-sped-up-aws-cloudformation-deployments-with-optimistic-stabilization/).

The `--exit-on-config-complete` flag is supported by `cdk deploy` only.

**Note**: This command is considered experimental,
and might have breaking changes in the future.

### `cdk watch`

The `watch` command is similar to `deploy`,
Expand Down
14 changes: 12 additions & 2 deletions packages/aws-cdk/lib/api/deploy-stack.ts
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,13 @@ export interface DeployStackOptions {
* @default true To remain backward compatible.
*/
readonly assetParallelism?: boolean;

/**
* Whether to enable exit on configuration complete.
*
* @default false
*/
readonly exitOnConfigComplete?: boolean;
}

export type DeploymentMethod =
Expand Down Expand Up @@ -339,6 +346,7 @@ class FullCloudFormationDeployment {
private readonly update: boolean;
private readonly verb: string;
private readonly uuid: string;
private readonly exitOnConfigComplete: boolean;

constructor(
private readonly options: DeployStackOptions,
Expand All @@ -353,6 +361,7 @@ class FullCloudFormationDeployment {
this.update = cloudFormationStack.exists && cloudFormationStack.stackStatus.name !== 'REVIEW_IN_PROGRESS';
this.verb = this.update ? 'update' : 'create';
this.uuid = uuid.v4();
this.exitOnConfigComplete = options.exitOnConfigComplete ?? false;
}

public async performDeployment(): Promise<DeployStackResult> {
Expand Down Expand Up @@ -514,7 +523,7 @@ class FullCloudFormationDeployment {

let finalState = this.cloudFormationStack;
try {
const successStack = await waitForStackDeploy(this.cfn, this.stackName);
const successStack = await waitForStackDeploy(this.cfn, this.stackName, this.exitOnConfigComplete);

// This shouldn't really happen, but catch it anyway. You never know.
if (!successStack) { throw new Error('Stack deploy failed (the stack disappeared while we were deploying it)'); }
Expand Down Expand Up @@ -570,6 +579,7 @@ export interface DestroyStackOptions {
deployName?: string;
quiet?: boolean;
ci?: boolean;
exitOnConfigComplete?: boolean;
}

export async function destroyStack(options: DestroyStackOptions) {
Expand All @@ -586,7 +596,7 @@ export async function destroyStack(options: DestroyStackOptions) {

try {
await cfn.deleteStack({ StackName: deployName, RoleARN: options.roleArn }).promise();
const destroyedStack = await waitForStackDelete(cfn, deployName);
const destroyedStack = await waitForStackDelete(cfn, deployName, options.exitOnConfigComplete);
if (destroyedStack && destroyedStack.stackStatus.name !== 'DELETE_COMPLETE') {
throw new Error(`Failed to destroy ${deployName}: ${destroyedStack.stackStatus}`);
}
Expand Down
9 changes: 9 additions & 0 deletions packages/aws-cdk/lib/api/deployments.ts
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,13 @@ export interface DeployStackOptions {
* @default false
*/
ignoreNoStacks?: boolean;

/**
* Whether to exit on configuration complete.
*
* @default false
*/
exitOnConfigComplete?: boolean;
}

interface AssetOptions {
Expand Down Expand Up @@ -260,6 +267,7 @@ export interface DestroyStackOptions {
quiet?: boolean;
force?: boolean;
ci?: boolean;
exitOnConfigComplete?: boolean;
}

export interface StackExistsOptions {
Expand Down Expand Up @@ -414,6 +422,7 @@ export class Deployments {
resourcesToImport: options.resourcesToImport,
overrideTemplate: options.overrideTemplate,
assetParallelism: options.assetParallelism,
exitOnConfigComplete: options.exitOnConfigComplete,
});
}

Expand Down
23 changes: 18 additions & 5 deletions packages/aws-cdk/lib/api/util/cloudformation.ts
Original file line number Diff line number Diff line change
Expand Up @@ -431,14 +431,16 @@ export function changeSetHasNoChanges(description: CloudFormation.DescribeChange
*
* @param cfn a CloudFormation client
* @param stackName the name of the stack to wait for after a delete
* @param exitOnConfigComplete whether to exit on configuration complete
*
* @returns the CloudFormation description of the stabilized stack after the delete attempt
*/
export async function waitForStackDelete(
cfn: CloudFormation,
stackName: string): Promise<CloudFormationStack | undefined> {
stackName: string,
exitOnConfigComplete?: boolean): Promise<CloudFormationStack | undefined> {

const stack = await stabilizeStack(cfn, stackName);
const stack = await stabilizeStack(cfn, stackName, exitOnConfigComplete);
if (!stack) { return undefined; }

const status = stack.stackStatus;
Expand All @@ -458,20 +460,26 @@ export async function waitForStackDelete(
*
* @param cfn a CloudFormation client
* @param stackName the name of the stack to wait for after an update
* @param exitOnConfigComplete exit on configuration complete
*
* @returns the CloudFormation description of the stabilized stack after the update attempt
*/
export async function waitForStackDeploy(
cfn: CloudFormation,
stackName: string): Promise<CloudFormationStack | undefined> {
stackName: string,
exitOnConfigComplete?: boolean): Promise<CloudFormationStack | undefined> {

const stack = await stabilizeStack(cfn, stackName);
const stack = await stabilizeStack(cfn, stackName, exitOnConfigComplete);
if (!stack) { return undefined; }

const status = stack.stackStatus;

if (status.isCreationFailure) {
throw new Error(`The stack named ${stackName} failed creation, it may need to be manually deleted from the AWS console: ${status}`);
} else if (exitOnConfigComplete && status.isConfigurationComplete) {
// Considering this in a stable status with exitOnConfigComplete is provided.
// When this happens, status.isDeploySuccess would be false but we consider it a successful deployment.
return stack;
} else if (!status.isDeploySuccess) {
throw new Error(`The stack named ${stackName} failed to deploy: ${status}`);
}
Expand All @@ -482,7 +490,7 @@ export async function waitForStackDeploy(
/**
* Wait for a stack to become stable (no longer _IN_PROGRESS), returning it
*/
export async function stabilizeStack(cfn: CloudFormation, stackName: string) {
export async function stabilizeStack(cfn: CloudFormation, stackName: string, exitOnConfigComplete?: boolean) {
debug('Waiting for stack %s to finish creating or updating...', stackName);
return waitFor(async () => {
const stack = await CloudFormationStack.lookup(cfn, stackName);
Expand All @@ -492,6 +500,11 @@ export async function stabilizeStack(cfn: CloudFormation, stackName: string) {
}
const status = stack.stackStatus;
if (status.isInProgress) {
// stack in CONFIGURATION_COMPLETE detailed status
if (exitOnConfigComplete && status.isConfigurationComplete) {
debug('Stack %s in CONFIGURATION_COMPLETE detailed status. Considering this in a stable status (%s)', stackName, status);
return stack;
}
debug('Stack %s has an ongoing operation in progress and is not stable (%s)', stackName, status);
return undefined;
} else if (status.isReviewInProgress) {
Expand Down
26 changes: 23 additions & 3 deletions packages/aws-cdk/lib/api/util/cloudformation/stack-status.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,18 @@ import * as AWS from 'aws-sdk';
*/
export class StackStatus {
public static fromStackDescription(description: AWS.CloudFormation.Stack) {
return new StackStatus(description.StackStatus, description.StackStatusReason);
return new StackStatus(
description.StackStatus,
description.StackStatusReason,
description.DetailedStatus,
);
}

constructor(public readonly name: string, public readonly reason?: string) {}
constructor(
public readonly name: string,
public readonly reason?: string,
public readonly detailedName?: string,
) {}

get isCreationFailure(): boolean {
return this.name === 'ROLLBACK_COMPLETE'
Expand Down Expand Up @@ -46,7 +54,19 @@ export class StackStatus {
|| this.name === 'UPDATE_ROLLBACK_COMPLETE';
}

get isConfigurationComplete(): boolean {
return this.detailedName === 'CONFIGURATION_COMPLETE';
}

get isValidationFailed(): boolean {
return this.detailedName === 'VALIDATION_FAILED';
}

public toString(): string {
return this.name + (this.reason ? ` (${this.reason})` : '');
const statusString = this.name + (this.reason ? ` (${this.reason})` : '');
const detailedStatusString = this.detailedName
? ` - ${this.detailedName}`
: '';
return statusString + detailedStatusString;
}
}
8 changes: 8 additions & 0 deletions packages/aws-cdk/lib/cdk-toolkit.ts
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,7 @@ export class CdkToolkit {
extraUserAgent: options.extraUserAgent,
assetParallelism: options.assetParallelism,
ignoreNoStacks: options.ignoreNoStacks,
exitOnConfigComplete: options.exitOnConfigComplete,
});

const message = result.noOp
Expand Down Expand Up @@ -1339,6 +1340,13 @@ export interface DeployOptions extends CfnDeployOptions, WatchOptions {
* @default false
*/
readonly ignoreNoStacks?: boolean;

/**
* Whether to exit on configuration complete.
*
* @default false
*/
readonly exitOnConfigComplete?: boolean;
}

export interface ImportOptions extends CfnDeployOptions {
Expand Down
4 changes: 3 additions & 1 deletion packages/aws-cdk/lib/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,8 @@ async function parseCommandLineArguments(args: string[]) {
.option('concurrency', { type: 'number', desc: 'Maximum number of simultaneous deployments (dependency permitting) to execute.', default: 1, requiresArg: true })
.option('asset-parallelism', { type: 'boolean', desc: 'Whether to build/publish assets in parallel' })
.option('asset-prebuild', { type: 'boolean', desc: 'Whether to build all assets before deploying the first stack (useful for failing Docker builds)', default: true })
.option('ignore-no-stacks', { type: 'boolean', desc: 'Whether to deploy if the app contains no stacks', default: false }),
.option('ignore-no-stacks', { type: 'boolean', desc: 'Whether to deploy if the app contains no stacks', default: false })
.option('exit-on-config-complete', { type: 'boolean', desc: 'Whether to exit on configuration complete', default: false }),
)
.command('import [STACK]', 'Import existing resource(s) into the given STACK', (yargs: Argv) => yargs
.option('execute', { type: 'boolean', desc: 'Whether to execute ChangeSet (--no-execute will NOT execute the ChangeSet)', default: true })
Expand Down Expand Up @@ -604,6 +605,7 @@ export async function exec(args: string[], synthesizer?: Synthesizer): Promise<n
assetParallelism: configuration.settings.get(['assetParallelism']),
assetBuildTime: configuration.settings.get(['assetPrebuild']) ? AssetBuildTime.ALL_BEFORE_DEPLOY : AssetBuildTime.JUST_IN_TIME,
ignoreNoStacks: args.ignoreNoStacks,
exitOnConfigComplete: args.exitOnConfigComplete,
pahud marked this conversation as resolved.
Show resolved Hide resolved
});

case 'import':
Expand Down
1 change: 1 addition & 0 deletions packages/aws-cdk/lib/settings.ts
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,7 @@ export class Settings {
assetParallelism: argv['asset-parallelism'],
assetPrebuild: argv['asset-prebuild'],
ignoreNoStacks: argv['ignore-no-stacks'],
exitOnConfigComplete: argv.exitOnConfigComplete,
});
}

Expand Down
Loading
Loading