Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(aws-ec2): userDataCausesReplacement timeouts, if resourceSignalTimeout is set #12749

Closed
ilko-rbi opened this issue Jan 28, 2021 · 4 comments · Fixed by #18726
Closed

(aws-ec2): userDataCausesReplacement timeouts, if resourceSignalTimeout is set #12749

ilko-rbi opened this issue Jan 28, 2021 · 4 comments · Fixed by #18726
Labels
@aws-cdk/aws-ec2 Related to Amazon Elastic Compute Cloud bug This issue is a bug. effort/small Small work item – less than a day of effort p1

Comments

@ilko-rbi
Copy link

If userDataCausesReplacement is set to true and additionally resourceSignalTimeout is set the CF stack timeouts in the creation of the new EC2 instance due to not able to receive the cfn-signal from the newly created EC2

Reproduction Steps

The following stack works:

class Ec2UserDataReplacementTestStack(core.Stack):
    def __init__(self, scope: core.Construct, sid: str, **kwargs) -> None:
        super().__init__(scope, sid, **kwargs)

        vpc = aws_ec2.Vpc.from_lookup(self, id="vpcimported", vpc_id=VPC_ID)

        user_data_script_path = path.join(Path(__file__).parent.absolute(), "user-data.sh")

        ec2 = aws_ec2.Instance(
            scope=self,
            id="ec2id",
            instance_type=aws_ec2.InstanceType(instance_type_identifier="t3a.large"),
            machine_image=aws_ec2.GenericLinuxImage({"eu-central-1": "ami-0fc812ebb87bb5b8e"}),
            vpc=vpc,
            user_data_causes_replacement=True,
        )
        user_data_asset = aws_s3_assets.Asset(self, "user-data-s3-asset", path=user_data_script_path)
        local_path = ec2.user_data.add_s3_download_command(
            bucket=user_data_asset.bucket, bucket_key=user_data_asset.s3_object_key
        )
        ec2.user_data.add_execute_file_command(file_path=local_path)
        ec2.user_data.add_signal_on_exit_command(ec2)
        user_data_asset.grant_read(ec2.role)

If we add to the ec2.Instance additionally custom timeout for the cfn-signal, a CreationPolicy is generated in the stack and the creation / update of the stack timeouts after the set timeout:

ec2 = aws_ec2.Instance(
            scope=self,
            id="ec2id",
            instance_type=aws_ec2.InstanceType(instance_type_identifier="t3a.large"),
            machine_image=aws_ec2.GenericLinuxImage({"eu-central-1": "ami-0fc812ebb87bb5b8e"}),
            vpc=vpc,
            user_data_causes_replacement=True,
            resource_signal_timeout=core.Duration.minutes(3)
        )

adds:

CreationPolicy:
      ResourceSignal:
        Timeout: PT3M

However the update (even the initial creation) fails:

2021-01-28 13:49:40 UTC+0100 | myteststack2 | ROLLBACK_IN_PROGRESS | The following resource(s) failed to create: [ec2idC85D4938b6c9a12994f1bfa6]. Rollback requested by user.
-- | -- | -- | --
2021-01-28 13:49:39 UTC+0100 | ec2idC85D4938b6c9a12994f1bfa6 | CREATE_FAILED | Failed to receive 1 resource signal(s) within the specified duration
2021-01-28 13:46:07 UTC+0100 | ec2idC85D4938b6c9a12994f1bfa6 | CREATE_IN_PROGRESS | Resource creation Initiated

What did you expect to happen?

Creation / update must work with this setup also

What actually happened?

CF doesn't receive the cfn-signal sent from the newly created EC2 instance, perhaps this relates to the "manipulated" logical IDs generated in this case?

Environment

  • CDK CLI Version : 1.85.0
  • Framework Version:
  • Node.js Version: v12.18.4
  • OS : Amazon Linux 2
  • Language (Version): Python 3.7.7

Other

n.a.


This is 🐛 Bug Report

@ilko-rbi ilko-rbi added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 28, 2021
@github-actions github-actions bot added the @aws-cdk/aws-ec2 Related to Amazon Elastic Compute Cloud label Jan 28, 2021
@rix0rrr
Copy link
Contributor

rix0rrr commented Feb 8, 2021

perhaps this relates to the "manipulated" logical IDs generated in this case?

This could very well be the case. You should be able to determine that from the template, see if the logical ID of the instance and the logical ID in the UserData match.

@rix0rrr rix0rrr added effort/small Small work item – less than a day of effort p1 labels Feb 8, 2021
@ilko-rbi
Copy link
Author

Hi, sorry for the long delay. Yes, the logical IDs match - the next assumption would be that perhaps CloudFormation is confused by this, or we are hitting some bug? If someone can share more details I can open a support request to AWS.

@ryparker ryparker removed the needs-triage This issue or PR still needs to be triaged. label Jun 1, 2021
@koshic
Copy link

koshic commented Nov 22, 2021

@ilko-rbi , it'a bug.

  1. 'user_data_causes_replacement' leads to lazy (which is important) logicalId update:
  const digest = md5.digest('hex').substr(0, 16);
  return `${originalLogicalId}${digest}`; // SomeInstance1236HASH

  1. UserData read logicalId and build command to send a signal inside addSignalOnExitCommand method:
  const resourceID = stack.getLogicalId(resource.node.defaultChild); // still SomeInstance1236, due to lazy in #1
  this.addOnExitCommands(`cfn-signal --stack ${stack.stackName} --resource ${resourceID} --region ${stack.region} -- 
  success ($success.ToString().ToLower())`);

So, cfn-signal script contains obsolete logicalId without hash part (SomeInstance1236 in this example) and CF will never know that SomeInstance1236HASH is ready.

Workaround - create similar call to addOnExitCommand manually but get resourceId via 'instance.node.defaultChild as CfnInstance).logicalId'. It works for me and rendered with full logicalId.

@mergify mergify bot closed this as completed in #18726 Jan 31, 2022
mergify bot pushed a commit that referenced this issue Jan 31, 2022
…ion with `userDataCausesReplacement` (#18726)

If both `addSignalOnExitCommand` _and_ `userDataCausesReplacement` are
 used it results in an invalid logicalId being used in the
`cfn-signal` call. This is due to `addSignalOnExitCommand` getting the
logicalID from `Stack.getLogicalId` which does not take into
consideration logicalId overrides which `userDataCausesReplacement`
uses.

This updates `addSignalOnExitCommand` to use the `logicalId` of the
resource which is evaluated lazily and happens after all overrides.

fixes #12749


----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

TikiTDO pushed a commit to TikiTDO/aws-cdk that referenced this issue Feb 21, 2022
…ion with `userDataCausesReplacement` (aws#18726)

If both `addSignalOnExitCommand` _and_ `userDataCausesReplacement` are
 used it results in an invalid logicalId being used in the
`cfn-signal` call. This is due to `addSignalOnExitCommand` getting the
logicalID from `Stack.getLogicalId` which does not take into
consideration logicalId overrides which `userDataCausesReplacement`
uses.

This updates `addSignalOnExitCommand` to use the `logicalId` of the
resource which is evaluated lazily and happens after all overrides.

fixes aws#12749


----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-ec2 Related to Amazon Elastic Compute Cloud bug This issue is a bug. effort/small Small work item – less than a day of effort p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants