Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

service/ec2: Additional error handling for VPC Endpoint and VPC Endpoint Service deletion, sweeper fixes for Route Tables, VPC Endpoints, and VPC Endpoint Services #16656

Merged
merged 4 commits into from
Feb 3, 2021

Conversation

bflad
Copy link
Contributor

@bflad bflad commented Dec 9, 2020

Community Note

  • Please vote on this pull request by adding a 👍 reaction to the original pull request comment to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for pull request followers and do not help prioritize the request

Reference: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DeleteVpcEndpoints.html
Reference: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DeleteVpcEndpointServiceConfigurations.html

Release note for CHANGELOG:

* resource/aws_vpc_endpoint: Return additional unsuccessful deletion information immediately as an error instead of timing out while waiting for deletion
* resource/aws_vpc_endpoint_service: Return additional unsuccessful deletion information immediately as an error instead of timing out while waiting for deletion

The DeleteVpcEndpoints and DeleteVpcEndpointServiceConfigurations APIs will sometimes return failures in an Unsuccessful array in the response, instead of a normal error. Previously the resource and sweeper did not account for this type of error response and would timeout on deletion after never reporting underlying issue:

2020/12/08 18:43:52 Sweeper Tests ran unsuccessfully:
...
  - aws_vpc_endpoint_service: error waiting for VPC Endpoint Service (vpce-svc-0c300eaebde5aec19) to delete: timeout while waiting for state to become 'Deleted' (last state: 'Available', timeout: 10m0s)
...
  - aws_vpc_endpoint: error waiting for VPC Endpoint (vpce-0395ac1f6cc86b11a) to delete: timeout while waiting for state to become 'deleted' (last state: 'available', timeout: 10m0s)

Now the resource will handle this response type, the VPC Endpoint sweepers have been refactored to use the resource deletion function, and the VPC Endpoint sweepers will correctly show the unsuccessful deletions while immediately continuing on to the next item:

2020/12/08 20:46:59 Sweeper Tests ran unsuccessfully:
  - aws_vpc_endpoint_service: 1 error occurred:
  * error deleting EC2 VPC Endpoint Service (vpce-svc-0c300eaebde5aec19): error deleting EC2 VPC Endpoint Service (vpce-svc-0c300eaebde5aec19): 1 error occurred:
  * vpce-svc-0c300eaebde5aec19: ExistingVpcEndpointConnections: Service has existing active VPC Endpoint connections!
...
  - aws_vpc_endpoint: 1 error occurred:
  * error deleting EC2 VPC Endpoint (vpce-0395ac1f6cc86b11a): error deleting EC2 VPC Endpoint (vpce-0395ac1f6cc86b11a): 1 error occurred:
  * vpce-0395ac1f6cc86b11a: InvalidParameter: Endpoint must be removed from route table before deletion

To fix the underlying cause of these errors, the Route Table sweeper needed to be added as a VPC Endpoint dependency and the Route Table sweeper needed to delete non-local/non-public-IGW routes if the Route Table was the main route table for the VPC (as main Route Tables cannot be deleted):

2020/12/08 21:12:50 [DEBUG] Running Sweepers for region (us-west-2):
2020/12/08 21:12:50 [DEBUG] Running Sweeper (aws_route_table) in region (us-west-2)
2020/12/08 21:12:50 [INFO] AWS Auth provider used: "SharedCredentialsProvider"
2020/12/08 21:12:50 [DEBUG] Trying to get account information via sts:GetCallerIdentity
2020/12/08 21:12:50 [DEBUG] Trying to get account information via sts:GetCallerIdentity
2020/12/08 21:12:52 [DEBUG] Deleting EC2 Route Table (rtb-09af9318dcc5ccaf9) Route
2020/12/08 21:12:52 [DEBUG] Sweeper (aws_vpc_endpoint_service) has dependency (aws_vpc_endpoint), running..
2020/12/08 21:12:52 [DEBUG] Sweeper (aws_vpc_endpoint) has dependency (aws_route_table), running..
2020/12/08 21:12:52 [DEBUG] Sweeper (aws_route_table) already ran in region (us-west-2)
2020/12/08 21:12:52 [DEBUG] Running Sweeper (aws_vpc_endpoint) in region (us-west-2)
2020/12/08 21:12:53 [INFO] Deleting EC2 VPC Endpoint: vpce-0395ac1f6cc86b11a
2020/12/08 21:12:53 [DEBUG] Waiting for state to become: [deleted]
2020/12/08 21:12:58 [DEBUG] Reading VPC Endpoint: vpce-0395ac1f6cc86b11a
2020/12/08 21:12:59 [TRACE] Waiting 5s before next try
2020/12/08 21:13:04 [DEBUG] Reading VPC Endpoint: vpce-0395ac1f6cc86b11a
2020/12/08 21:13:04 [TRACE] Waiting 10s before next try
2020/12/08 21:13:14 [DEBUG] Reading VPC Endpoint: vpce-0395ac1f6cc86b11a
2020/12/08 21:13:15 [DEBUG] Running Sweeper (aws_vpc_endpoint_service) in region (us-west-2)
2020/12/08 21:13:15 [INFO] Deleting EC2 VPC Endpoint Service: vpce-svc-0c300eaebde5aec19
2020/12/08 21:13:16 [DEBUG] Waiting for state to become: [Deleted]
2020/12/08 21:13:21 [DEBUG] Reading VPC Endpoint Service Configuration: vpce-svc-0c300eaebde5aec19
2020/12/08 21:13:21 [DEBUG] Sweeper (aws_vpc_endpoint) has dependency (aws_route_table), running..
2020/12/08 21:13:21 [DEBUG] Sweeper (aws_route_table) already ran in region (us-west-2)
2020/12/08 21:13:21 [DEBUG] Sweeper (aws_vpc_endpoint) already ran in region (us-west-2)
2020/12/08 21:13:21 Sweeper Tests ran successfully:
  - aws_vpc_endpoint_service
  - aws_route_table
  - aws_vpc_endpoint
ok    github.com/terraform-providers/terraform-provider-aws/aws 33.689s

Output from acceptance testing:

--- PASS: TestAccAWSVpcEndpoint_disappears (37.60s)
--- PASS: TestAccAWSVpcEndpoint_gatewayBasic (38.82s)
--- PASS: TestAccAWSVpcEndpoint_gatewayPolicy (72.11s)
--- PASS: TestAccAWSVpcEndpoint_gatewayWithRouteTableAndPolicy (87.14s)
--- PASS: TestAccAWSVpcEndpoint_interfaceBasic (78.17s)
--- PASS: TestAccAWSVpcEndpoint_interfaceNonAWSServiceAcceptOnCreate (276.65s)
--- PASS: TestAccAWSVpcEndpoint_interfaceNonAWSServiceAcceptOnUpdate (333.65s)
--- PASS: TestAccAWSVpcEndpoint_interfaceWithSubnetAndSecurityGroup (448.87s)
--- PASS: TestAccAWSVpcEndpoint_tags (89.87s)
--- PASS: TestAccAWSVpcEndpoint_VpcEndpointType_GatewayLoadBalancer (274.15s)

--- PASS: TestAccAWSVpcEndpointService_AllowedPrincipals (280.60s)
--- PASS: TestAccAWSVpcEndpointService_basic (252.94s)
--- PASS: TestAccAWSVpcEndpointService_disappears (258.46s)
--- PASS: TestAccAWSVpcEndpointService_GatewayLoadBalancerArns (208.91s)
--- PASS: TestAccAWSVpcEndpointService_tags (288.75s)

Note: When working with assume role credentials, some of these test configurations can error due to the STS GetCallerIdentity ARN:

=== CONT  TestAccAWSVpcEndpoint_VpcEndpointType_GatewayLoadBalancer
    resource_aws_vpc_endpoint_test.go:519: Step 1/2 error: Error running apply:
        Error: error adding VPC Endpoint Service permissions: InvalidPrincipal: Invalid Principal: 'arn:aws:sts::--OMITTED--:assumed-role/terraform_team1_dev-admin/--OMITTED--'
          status code: 400, request id: 375c4645-3761-49b1-9758-3c9b5a51c115

=== CONT  TestAccAWSVpcEndpointService_AllowedPrincipals
    resource_aws_vpc_endpoint_service_test.go:125: Step 1/3 error: Error running apply:
        Error: error adding VPC Endpoint Service permissions: InvalidPrincipal: Invalid Principal: 'arn:aws:sts::--OMITTED--:assumed-role/terraform_team1_dev-admin/--OMITTED--'
          status code: 400, request id: f3e9a77f-3c7d-4acc-9127-f931c4ffbb37

Will create followup issue for that problem.

…int Service deletion, sweeper fixes for Route Tables, VPC Endpoints, and VPC Endpoint Services

The `DeleteVpcEndpoints` and `DeleteVpcEndpointServiceConfigurations` APIs will sometimes return failures in an `Unsuccessful` array in the response, instead of a normal error. Previously the resource and sweeper did not account for this type of error response and would timeout on deletion after never reporting underlying issue:

```
2020/12/08 18:43:52 Sweeper Tests ran unsuccessfully:
...
  - aws_vpc_endpoint_service: error waiting for VPC Endpoint Service (vpce-svc-0c300eaebde5aec19) to delete: timeout while waiting for state to become 'Deleted' (last state: 'Available', timeout: 10m0s)
...
  - aws_vpc_endpoint: error waiting for VPC Endpoint (vpce-0395ac1f6cc86b11a) to delete: timeout while waiting for state to become 'deleted' (last state: 'available', timeout: 10m0s)
```

Now the resource will handle this response type, the VPC Endpoint sweepers have been refactored to use the resource deletion function, and the VPC Endpoint sweepers will correctly show the unsuccessful deletions while immediately continuing on to the next item:

```
2020/12/08 20:46:59 Sweeper Tests ran unsuccessfully:
  - aws_vpc_endpoint_service: 1 error occurred:
  * error deleting EC2 VPC Endpoint Service (vpce-svc-0c300eaebde5aec19): error deleting EC2 VPC Endpoint Service (vpce-svc-0c300eaebde5aec19): 1 error occurred:
  * vpce-svc-0c300eaebde5aec19: ExistingVpcEndpointConnections: Service has existing active VPC Endpoint connections!
...
  - aws_vpc_endpoint: 1 error occurred:
  * error deleting EC2 VPC Endpoint (vpce-0395ac1f6cc86b11a): error deleting EC2 VPC Endpoint (vpce-0395ac1f6cc86b11a): 1 error occurred:
  * vpce-0395ac1f6cc86b11a: InvalidParameter: Endpoint must be removed from route table before deletion
```

To fix the underlying cause of these errors, the Route Table sweeper needed to be added as a VPC Endpoint dependency and the Route Table sweeper needed to delete non-local/non-public-IGW routes if the Route Table was the main route table for the VPC (as main Route Tables cannot be deleted):

```
2020/12/08 21:12:50 [DEBUG] Running Sweepers for region (us-west-2):
2020/12/08 21:12:50 [DEBUG] Running Sweeper (aws_route_table) in region (us-west-2)
2020/12/08 21:12:50 [INFO] AWS Auth provider used: "SharedCredentialsProvider"
2020/12/08 21:12:50 [DEBUG] Trying to get account information via sts:GetCallerIdentity
2020/12/08 21:12:50 [DEBUG] Trying to get account information via sts:GetCallerIdentity
2020/12/08 21:12:52 [DEBUG] Deleting EC2 Route Table (rtb-09af9318dcc5ccaf9) Route
2020/12/08 21:12:52 [DEBUG] Sweeper (aws_vpc_endpoint_service) has dependency (aws_vpc_endpoint), running..
2020/12/08 21:12:52 [DEBUG] Sweeper (aws_vpc_endpoint) has dependency (aws_route_table), running..
2020/12/08 21:12:52 [DEBUG] Sweeper (aws_route_table) already ran in region (us-west-2)
2020/12/08 21:12:52 [DEBUG] Running Sweeper (aws_vpc_endpoint) in region (us-west-2)
2020/12/08 21:12:53 [INFO] Deleting EC2 VPC Endpoint: vpce-0395ac1f6cc86b11a
2020/12/08 21:12:53 [DEBUG] Waiting for state to become: [deleted]
2020/12/08 21:12:58 [DEBUG] Reading VPC Endpoint: vpce-0395ac1f6cc86b11a
2020/12/08 21:12:59 [TRACE] Waiting 5s before next try
2020/12/08 21:13:04 [DEBUG] Reading VPC Endpoint: vpce-0395ac1f6cc86b11a
2020/12/08 21:13:04 [TRACE] Waiting 10s before next try
2020/12/08 21:13:14 [DEBUG] Reading VPC Endpoint: vpce-0395ac1f6cc86b11a
2020/12/08 21:13:15 [DEBUG] Running Sweeper (aws_vpc_endpoint_service) in region (us-west-2)
2020/12/08 21:13:15 [INFO] Deleting EC2 VPC Endpoint Service: vpce-svc-0c300eaebde5aec19
2020/12/08 21:13:16 [DEBUG] Waiting for state to become: [Deleted]
2020/12/08 21:13:21 [DEBUG] Reading VPC Endpoint Service Configuration: vpce-svc-0c300eaebde5aec19
2020/12/08 21:13:21 [DEBUG] Sweeper (aws_vpc_endpoint) has dependency (aws_route_table), running..
2020/12/08 21:13:21 [DEBUG] Sweeper (aws_route_table) already ran in region (us-west-2)
2020/12/08 21:13:21 [DEBUG] Sweeper (aws_vpc_endpoint) already ran in region (us-west-2)
2020/12/08 21:13:21 Sweeper Tests ran successfully:
  - aws_vpc_endpoint_service
  - aws_route_table
  - aws_vpc_endpoint
ok    github.com/terraform-providers/terraform-provider-aws/aws 33.689s
```

Output from acceptance testing:

```
--- PASS: TestAccAWSVpcEndpoint_disappears (37.60s)
--- PASS: TestAccAWSVpcEndpoint_gatewayBasic (38.82s)
--- PASS: TestAccAWSVpcEndpoint_gatewayPolicy (72.11s)
--- PASS: TestAccAWSVpcEndpoint_gatewayWithRouteTableAndPolicy (87.14s)
--- PASS: TestAccAWSVpcEndpoint_interfaceBasic (78.17s)
--- PASS: TestAccAWSVpcEndpoint_interfaceNonAWSServiceAcceptOnCreate (276.65s)
--- PASS: TestAccAWSVpcEndpoint_interfaceNonAWSServiceAcceptOnUpdate (333.65s)
--- PASS: TestAccAWSVpcEndpoint_interfaceWithSubnetAndSecurityGroup (448.87s)
--- PASS: TestAccAWSVpcEndpoint_tags (89.87s)
--- PASS: TestAccAWSVpcEndpoint_VpcEndpointType_GatewayLoadBalancer (274.15s)

--- PASS: TestAccAWSVpcEndpointService_AllowedPrincipals (280.60s)
--- PASS: TestAccAWSVpcEndpointService_basic (252.94s)
--- PASS: TestAccAWSVpcEndpointService_disappears (258.46s)
--- PASS: TestAccAWSVpcEndpointService_GatewayLoadBalancerArns (208.91s)
--- PASS: TestAccAWSVpcEndpointService_tags (288.75s)
```

Note: When working with assume role credentials, some of these test configurations can error due to the STS `GetCallerIdentity` ARN:

```
=== CONT  TestAccAWSVpcEndpoint_VpcEndpointType_GatewayLoadBalancer
    resource_aws_vpc_endpoint_test.go:519: Step 1/2 error: Error running apply:
        Error: error adding VPC Endpoint Service permissions: InvalidPrincipal: Invalid Principal: 'arn:aws:sts::--OMITTED--:assumed-role/terraform_team1_dev-admin/--OMITTED--'
          status code: 400, request id: 375c4645-3761-49b1-9758-3c9b5a51c115

=== CONT  TestAccAWSVpcEndpointService_AllowedPrincipals
    resource_aws_vpc_endpoint_service_test.go:125: Step 1/3 error: Error running apply:
        Error: error adding VPC Endpoint Service permissions: InvalidPrincipal: Invalid Principal: 'arn:aws:sts::--OMITTED--:assumed-role/terraform_team1_dev-admin/--OMITTED--'
          status code: 400, request id: f3e9a77f-3c7d-4acc-9127-f931c4ffbb37
```

Will create followup issue for that problem.
@bflad bflad added the bug Addresses a defect in current functionality. label Dec 9, 2020
@bflad bflad requested a review from a team as a code owner December 9, 2020 02:50
@ghost ghost added size/XL Managed by automation to categorize the size of a PR. service/ec2 Issues and PRs that pertain to the ec2 service. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure. labels Dec 9, 2020
Base automatically changed from master to main January 23, 2021 00:59
Copy link
Contributor

@anGie44 anGie44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome! nice catch of those sneaky delete errors 👍 👍

@bflad
Copy link
Contributor Author

bflad commented Feb 3, 2021

Reverified after rebase to fix merge conflict:

--- PASS: TestAccAWSVpcEndpoint_disappears (32.56s)
--- PASS: TestAccAWSVpcEndpoint_gatewayBasic (35.64s)
--- PASS: TestAccAWSVpcEndpoint_gatewayPolicy (66.22s)
--- PASS: TestAccAWSVpcEndpoint_gatewayWithRouteTableAndPolicy (71.18s)
--- PASS: TestAccAWSVpcEndpoint_interfaceBasic (114.68s)
--- PASS: TestAccAWSVpcEndpoint_interfaceNonAWSServiceAcceptOnCreate (351.69s)
--- PASS: TestAccAWSVpcEndpoint_interfaceNonAWSServiceAcceptOnUpdate (317.50s)
--- PASS: TestAccAWSVpcEndpoint_interfaceWithSubnetAndSecurityGroup (348.50s)
--- PASS: TestAccAWSVpcEndpoint_tags (90.84s)
--- PASS: TestAccAWSVpcEndpoint_VpcEndpointType_GatewayLoadBalancer (327.58s)

--- PASS: TestAccAWSVpcEndpointService_AllowedPrincipals (282.63s)
--- PASS: TestAccAWSVpcEndpointService_basic (237.55s)
--- PASS: TestAccAWSVpcEndpointService_disappears (260.95s)
--- PASS: TestAccAWSVpcEndpointService_GatewayLoadBalancerArns (210.27s)
--- PASS: TestAccAWSVpcEndpointService_private_dns_name (258.96s)
--- PASS: TestAccAWSVpcEndpointService_tags (294.03s)

@bflad bflad merged commit 9f92117 into main Feb 3, 2021
@bflad bflad deleted the t-aws_vpc_endpoint-sweeper-dependency branch February 3, 2021 15:16
@github-actions github-actions bot added this to the v3.27.0 milestone Feb 3, 2021
github-actions bot pushed a commit that referenced this pull request Feb 3, 2021
@ghost
Copy link

ghost commented Feb 5, 2021

This has been released in version 3.27.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

@ghost
Copy link

ghost commented Mar 5, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Mar 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service. size/XL Managed by automation to categorize the size of a PR. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants