Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cli): support optimistic stabilization strategy(--exit-on-config-complete) #29536

Closed
wants to merge 17 commits into from

Conversation

pahud
Copy link
Contributor

@pahud pahud commented Mar 19, 2024

This PR adds an --exit-on-config-complete flag for cdk deploy. When this flag is enabled, CDK will monitor the DetailedStatusof the stack being deployed, and return success when the stack enters the CONFIGURATION_COMPLETE state.

This optimistic stabilization strategy can help reduce the total deployment time, especially when working with multiple stacks that have dependencies on each other. Normally, the CDK would wait for the entire deployment process to complete before returning, even if individual stacks had finished. With the --exit-on-config-complete flag, CDK will return as soon as each stack is in the CONFIGURATION_COMPLETE state, allowing subsequent dependent stacks to continue their deployment immediately. This can lead to significant time savings for complex CDK applications with many interdependent stacks.

Issue # (if applicable)

Closes #29443

Reason for this change

Description of changes

When --exit-on-config-complete is provided with cdk deploy, waitForStackDeploy() monitors the DetailedStatus for CONFIGURATION_COMPLETE and returns success upon it.

Description of how you validated changes

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added feature-request A feature should be added or improved. p1 labels Mar 19, 2024
@aws-cdk-automation aws-cdk-automation requested a review from a team March 19, 2024 02:19
@pahud pahud changed the title feat(core): support optimistic optimization feat(cli): support optimistic optimization Mar 19, 2024
@pahud pahud added pr/do-not-merge This PR should not be merged at this time. pr/work-in-progress This PR is a draft and needs further work. labels Mar 19, 2024
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Mar 19, 2024
@pahud pahud changed the title feat(cli): support optimistic optimization feat(cli): support optimistic stabilization strategy Mar 19, 2024
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter has failed. See the aws-cdk-automation comment below for failure reasons. If you believe this pull request should receive an exemption, please comment and provide a justification.

A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed add Clarification Request to a comment.

@aws-cdk-automation aws-cdk-automation added the pr/needs-cli-test-run This PR needs CLI tests run against it. label Mar 19, 2024
@pahud pahud marked this pull request as ready for review March 20, 2024 00:19
@pahud pahud removed pr/work-in-progress This PR is a draft and needs further work. pr/do-not-merge This PR should not be merged at this time. labels Mar 20, 2024
@TheRealAmazonKendra
Copy link
Contributor

@Mergifyio update

Copy link
Contributor

mergify bot commented Mar 21, 2024

update

✅ Branch has been successfully updated

This was referenced Mar 25, 2024
Copy link
Contributor

@comcalvi comcalvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This integration test is failing:

● deploy with optimistic stabilization
--


Test Suites: 1 failed, 2 passed, 3 total

Could you fix the failure? I can't find the actual error in the logs, it just says "exited with error code 1"

@pahud
Copy link
Contributor Author

pahud commented May 7, 2024

I will get it fixed next week. :-)

@pahud pahud changed the title feat(cli): support optimistic stabilization strategy feat(cli): support optimistic stabilization strategy(--exit-on-complete) May 16, 2024
@pahud pahud temporarily deployed to test-pipeline May 16, 2024 17:55 — with GitHub Actions Inactive
@pahud pahud changed the title feat(cli): support optimistic stabilization strategy(--exit-on-complete) feat(cli): support optimistic stabilization strategy(--exit-on-config-complete) May 16, 2024
@aws-cdk-automation
Copy link
Collaborator

➡️ PR build request submitted to test-main-pipeline ⬅️

A maintainer must now check the pipeline and add the pr-linter/cli-integ-tested label once the pipeline succeeds.

@@ -255,7 +256,8 @@ async function parseCommandLineArguments(args: string[]) {
.command('destroy [STACKS..]', 'Destroy the stack(s) named STACKS', (yargs: Argv) => yargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to also add an option for cdk watch??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I am wrong but AFAIK cdk watch essentially monitors a stack and update it through SDK calls without redeploying it. I think cdk watch would not monitor the stack status so I can't see how this would fit in this scenario. I might be wrong though.

Copy link
Contributor

@bergjaak bergjaak May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I thinkk we're both right. It's an options to turn off the --hot-swap for watch https://docs.aws.amazon.com/cdk/v2/guide/cli.html#cli-deploy.

This comment is just a suggestion, though. I would approve this PR with or without watch support, since that could be added in the future, if there was a demand for it

packages/aws-cdk/lib/cli.ts Show resolved Hide resolved
packages/aws-cdk/lib/cli.ts Outdated Show resolved Hide resolved
},
],
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A high level question -- what if a resource fails deploying after CONFIGURATION_COMPLETE status is emitted? Do we handle that case? Is that a possible case? If it is a possible case, could we have an integration test for that?

Copy link
Contributor Author

@pahud pahud May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will discuss this with the CFN team. As this PR is to monitor CONFIGURATION_COMPLETE for the stack detailed status, not the resources. It could be a risk that stack enters CONFIGURATION_COMPLETE detailed status but its final status is failed. In this case CDK CLI would assume the stack deployment is completed but actually is failed. We need to check:

  1. Is this possible for CFN stacks to happen?
  2. If that happens, what is the best DX for CDK users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is, if that happens, we should fail with exit status 1. One idea, without much thinking about how this would work, is to poll the statuses of all the stacks once the final stack has entered CONFIG_COMPLETE. So then it's just that the final stack of the deployment is not getting the speedup... which I think is fine since you then get the speed up on all the intermediate stacks.

I am considering if there is a case where a customer kicks off some sort of production thing as soon as the customer get a success signal from CDK deploy? If so, then we'd want to make sure that everything did successfully provision... unless CloudFormation handles that, somehow. I'm not too familiar with this new optimistic stabilization feature from CloudFormation.

Copy link
Contributor Author

@pahud pahud May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should figure out, let's say if we have 2 stacks to deploy with Stack2 depends on Stack1:

Stack1 enters CC state -> CLI starts deploying Stack2 -> Stack2 enters CC state -> CLI exit 0 but Stack1 ended up with final status failed. Now we get boost from Stack1 and Stack2 but Stack1 is actually failed. What should we do? If we just check the final status of the last stack, it would be success but actually Stack1 has failed and probably would be rolling back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preconception (which could be wrong) is that stack 2 would fail if stack 1 eventually failed. But if the scenario you're describing is true, then that biases me in favor of including a warning about this not being for production, and that would be all we'd do for addressing the failure case. So, similar to how we advise users to not use hotswap in production, we'd advise users to not use this flag in production. What do you think of that? I personally feel satisfied with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check with the CFN team:

  • Will stack2 fail when stack1 fails with CC status
    a. if stack2 would fail then we just need to check the status of the last stack
    b. if stack2 would NOT fail we will end up with a failed stack1 and a deployed stack2.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I agree with @bergjaak to warn customer to not use the flag in production. This is what CFN official user guide mentions.

Stack 2 will not fail automatically. unless CDK implement a pulling mechanism that read the state an fail the stack based on the stack 1 state.

packages/aws-cdk/lib/api/deployments.ts Outdated Show resolved Hide resolved
@aws-cdk-automation
Copy link
Collaborator

The pull request linter fails with the following errors:

❌ Features must contain a change to an integration test file and the resulting snapshot.
❌ CLI code has changed. A maintainer must run the code through the testing pipeline (git fetch origin pull/29536/head && git push -f origin FETCH_HEAD:test-main-pipeline), then add the 'pr-linter/cli-integ-tested' label when the pipeline succeeds.

PRs must pass status checks before we can provide a meaningful review.

If you would like to request an exemption from the status checks or clarification on feedback, please leave a comment on this PR containing Exemption Request and/or Clarification Request.

✅ A exemption request has been requested. Please wait for a maintainer's review.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 6209ce5
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@comcalvi
Copy link
Contributor

comcalvi commented Jun 28, 2024

The CDK should not take this feature. See this comment for further details: #29443 (comment)

@comcalvi comcalvi closed this Jun 28, 2024
@@ -444,6 +444,36 @@ class LambdaHotswapStack extends cdk.Stack {
}
}

class DetailedStatusStack extends cdk.Stack{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none of these resources currently cause the stack itself to emit the CONFIGURATION_COMPLETE event, so this test needs to be updated. The only resource I know of that does make it emit this event is ECS Services

@aws-cdk-automation
Copy link
Collaborator

The pull request linter fails with the following errors:

❌ Features must contain a change to an integration test file and the resulting snapshot.
❌ CLI code has changed. A maintainer must run the code through the testing pipeline (git fetch origin pull/29536/head && git push -f origin FETCH_HEAD:test-main-pipeline), then add the 'pr-linter/cli-integ-tested' label when the pipeline succeeds.

PRs must pass status checks before we can provide a meaningful review.

If you would like to request an exemption from the status checks or clarification on feedback, please leave a comment on this PR containing Exemption Request and/or Clarification Request.

✅ A exemption request has been requested. Please wait for a maintainer's review.

@pahud
Copy link
Contributor Author

pahud commented Jul 25, 2024

Update:

CDK can't support this feature at this moment because CFN would not emit the outputs in CONFIGURATION_COMPLETE phase and dependent stacks would fail if they reference the outputs from there.

We will evaluate this again when CFN has this support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution/core This is a PR that came from AWS. feature-request A feature should be added or improved. p1 pr/needs-cli-test-run This PR needs CLI tests run against it.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CDK CLI: execute deployment on CONFIGURATION_COMPLETE event
9 participants