Make sure route53 records really exist after creation and add support for all ASCII characters #4183

Crazybus · 2018-04-12T07:11:11Z

In #3579 we found some weird behaviour where records starting with @ were always showing up as a change for every terrafrom apply. The weird part about this was that the record was being created correctly and not showing up in state. In #3579 (comment) @bflad pointed out that @ doesn't need to be used with AWS and they actually give you a warning when you try to use it inside the AWS console. With this answer I was happy and able to move these @ records to the apex domain and move on with my life. However I was still very curious why the records weren't showing up in state properly.

Fast forward a few days and I submitted a job to move the values of the @ records to the apex domains for each zone. It worked! However a little while later someone reported an unrelated DNS record was no longer resolving. I had only just recently moved all of our production DNS into terraform so I was worried that the automation for deploying changes had accidentally deleted something. In #3579 (comment) you can see more detail about what actually happened.

Luckily all DNS changes are now automated by a jenkins job running terraform. Looking at the Route53 logs I could see that the unrelated record was removed exactly when terraform was removing the @ record. Another terraform run fixed things up and added the record back but I felt that I needed to figure out what happened. Luckily I was able to reproduce this locally however I still hadn't explained why it only happened for 1 of the 3 zones which had @ records.

Findings

How did a random record get deleted?

I figured out very quickly that this was only happening for zones which had more than 100 records (the default maxItems per page for the AWS api). I noticed this when I tried to reproduce it with a limited dataset and couldn't trigger the issue. So my focus shifted to the code which was looping of the paginated results from AWS.

The root cause was the line https://github.com/terraform-providers/terraform-provider-aws/blob/c77020f539eed9d3f1dab36c2d2d5a624ed3007b/aws/resource_aws_route53_record.go#L646

So the current record in the response was being compared to the *lopts.StartRecordName. Which for the first page of results was the correct record. However if the first result didn't match for some reason (In my case because @.domain.com != \\100.domain.com it would continue looping until it got to the next page. At this stage *lopts.StartRecordName now was set to the first result in the second page of results and then the record matched perfectly!

So this bug would only be triggered if the record being deleted didn't match with how AWS returned the record, and if there were more than 100 records in the zone.

P.S. I tried to write an acceptance test for this behaviour but with my other fixes I now can't trigger a scenario where terraform can create a record it can't find.

Why didn't terraform give an error that it couldn't find the record after creating it?

The above wouldn't have been possible if terraform had correctly detected that it couldn't read the record it had created. When creating and updating records the final line is https://github.com/terraform-providers/terraform-provider-aws/blob/c77020f539eed9d3f1dab36c2d2d5a624ed3007b/aws/resource_aws_route53_record.go#L454

Since this function is also used for checking if a record exists or not it silently will drop the record and remove it from state if the record doesn't exist.

https://github.com/terraform-providers/terraform-provider-aws/blob/c77020f539eed9d3f1dab36c2d2d5a624ed3007b/aws/resource_aws_route53_record.go#L520-L530

So what was happening is terraform was trying to create @.domain.com, after successfully creating it then called resourceAwsRoute53RecordRead which didn't return an error so Terraform didn't give an error.

Other improvements

Maxitems

My first reaction when seeing the looping over the first 100 results was....why? I then discovered the reason and added it as an inline comment so nobody else gets confused in the future. I have now optimised this to only return a single result if we know that there can be only one match. This saves terraform needing to pull in 100 resources everytime it wants to check if a single record exists.

Escaped ASCII characters

AWS supports a lot more special characters apart from * and @ which I have now added support for. You can see a full list at the Route53 Domain Name Format docs. I decided to just convert all of them instead of trying to maintain a list of supported characters. If a user does somehow pick a character that doesn't convert correctly they will now correctly receive an error when terraform tries to read back the record after creating it.

matches the variable name

verifying if a record exists after creating/updating since resourceAwsRoute53RecordRead does not return an error if the record does not exist

to avoid downloading 100 records for every request Compare the response records with the original record name instead of using the StartRecordName. In events where the record wasn't matching correctly StartRecordName would be updated on the next page and it would then always match the next record (record 101) leading to accidental deletions of the wrong record

…ginal character

bflad

@Crazybus thank you very much for the detailed comments and logic enhancement here. Both are extremely appreciated! 🚀

24 tests passed (all tests)
=== RUN   TestAccAWSRoute53Record_alias
--- PASS: TestAccAWSRoute53Record_alias (157.67s)
=== RUN   TestAccAWSRoute53Record_txtSupport
--- PASS: TestAccAWSRoute53Record_txtSupport (163.68s)
=== RUN   TestAccAWSRoute53Record_importUnderscored
--- PASS: TestAccAWSRoute53Record_importUnderscored (182.63s)
=== RUN   TestAccAWSRoute53Record_basic
--- PASS: TestAccAWSRoute53Record_basic (183.94s)
=== RUN   TestAccAWSRoute53Record_caaSupport
--- PASS: TestAccAWSRoute53Record_caaSupport (188.77s)
=== RUN   TestAccAWSRoute53Record_failover
--- PASS: TestAccAWSRoute53Record_failover (191.23s)
=== RUN   TestAccAWSRoute53Record_s3_alias
--- PASS: TestAccAWSRoute53Record_s3_alias (193.12s)
=== RUN   TestAccAWSRoute53Record_generatesSuffix
--- PASS: TestAccAWSRoute53Record_generatesSuffix (221.00s)
=== RUN   TestAccAWSRoute53Record_TypeChange
--- PASS: TestAccAWSRoute53Record_TypeChange (233.38s)
=== RUN   TestAccAWSRoute53Record_spfSupport
--- PASS: TestAccAWSRoute53Record_spfSupport (235.53s)
=== RUN   TestAccAWSRoute53Record_latency_basic
--- PASS: TestAccAWSRoute53Record_latency_basic (236.09s)
=== RUN   TestAccAWSRoute53Record_geolocation_basic
--- PASS: TestAccAWSRoute53Record_geolocation_basic (237.92s)
=== RUN   TestAccAWSRoute53Record_weighted_basic
--- PASS: TestAccAWSRoute53Record_weighted_basic (241.35s)
=== RUN   TestAccAWSRoute53Record_AliasChange
--- PASS: TestAccAWSRoute53Record_AliasChange (250.92s)
=== RUN   TestAccAWSRoute53Record_wildcard
--- PASS: TestAccAWSRoute53Record_wildcard (260.54s)
=== RUN   TestAccAWSRoute53Record_aliasUppercase
--- PASS: TestAccAWSRoute53Record_aliasUppercase (271.41s)
=== RUN   TestAccAWSRoute53Record_SetIdentiferChange
--- PASS: TestAccAWSRoute53Record_SetIdentiferChange (277.56s)
=== RUN   TestAccAWSRoute53Record_basic_fqdn
--- PASS: TestAccAWSRoute53Record_basic_fqdn (277.91s)
=== RUN   TestAccAWSRoute53Record_importBasic
--- PASS: TestAccAWSRoute53Record_importBasic (279.52s)
=== RUN   TestAccAWSRoute53Record_empty
--- PASS: TestAccAWSRoute53Record_empty (152.85s)
=== RUN   TestAccAWSRoute53Record_multivalue_answer_basic
--- PASS: TestAccAWSRoute53Record_multivalue_answer_basic (161.80s)
=== RUN   TestAccAWSRoute53Record_longTXTrecord
--- PASS: TestAccAWSRoute53Record_longTXTrecord (182.12s)
=== RUN   TestAccAWSRoute53Record_weighted_alias
--- PASS: TestAccAWSRoute53Record_weighted_alias (385.36s)
=== RUN   TestAccAWSRoute53Record_allowOverwrite
--- PASS: TestAccAWSRoute53Record_allowOverwrite (206.01s)

bflad · 2018-04-12T16:00:37Z

aws/resource_aws_route53_record.go

 	}

 	log.Printf("[DEBUG] List resource records sets for zone: %s, opts: %s",
 		zone, lopts)

 	var record *route53.ResourceRecordSet
+
+	// We need to loop over all records starting from the record we are looking for because


❤️ thank you!

bflad · 2018-04-12T16:01:57Z

aws/resource_aws_route53_record.go

-			name := cleanRecordName(*recordSet.Name)
-			if FQDN(strings.ToLower(name)) != FQDN(strings.ToLower(*lopts.StartRecordName)) {
+
+			responseName := strings.ToLower(cleanRecordName(*recordSet.Name))


Minor nitpick: To prevent crashes in the very unlikely scenario that AWS returns nil for either of these, it would be safer to wrap them in aws.StringValue() 👍 e.g. aws.StringValue(recordSet.Name)

bflad · 2018-04-18T18:27:05Z

This has been released in version 1.15.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

ghost · 2020-04-06T17:41:00Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

Crazybus added 4 commits April 12, 2018 07:35

Switch the no records and no hosted zones errors around so the next

47a6de3

matches the variable name

Use findRecord directly instead of resourceAwsRoute53RecordRead when

e682d46

verifying if a record exists after creating/updating since resourceAwsRoute53RecordRead does not return an error if the record does not exist

Support converting all ASCII characters (such as @) back to their ori…

b5fa8ed

…ginal character

ghost added the size/M Managed by automation to categorize the size of a PR. label Apr 12, 2018

Crazybus mentioned this pull request Apr 12, 2018

TF keeps recreating a '@' TXT record #3579

Closed

bflad added bug Addresses a defect in current functionality. service/route53 Issues and PRs that pertain to the route53 service. labels Apr 12, 2018

bflad added this to the v1.15.0 milestone Apr 12, 2018

bflad approved these changes Apr 12, 2018

View reviewed changes

bflad merged commit 9532212 into hashicorp:master Apr 12, 2018

bflad added a commit that referenced this pull request Apr 12, 2018

Update CHANGELOG for #4183

5e0bb19

bflad mentioned this pull request Apr 26, 2018

Make it possible to import Route53 records containing a wildcard #2849

Closed

jandockx mentioned this pull request Dec 31, 2019

Cannot create aws_route53_record with a name that contains period, space, and other characters #11445

Closed

ghost locked and limited conversation to collaborators Apr 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure route53 records really exist after creation and add support for all ASCII characters #4183

Make sure route53 records really exist after creation and add support for all ASCII characters #4183

Crazybus commented Apr 12, 2018 •

edited

Loading

bflad left a comment

bflad Apr 12, 2018

bflad Apr 12, 2018

bflad commented Apr 18, 2018

ghost commented Apr 6, 2020

Make sure route53 records really exist after creation and add support for all ASCII characters #4183

Make sure route53 records really exist after creation and add support for all ASCII characters #4183

Conversation

Crazybus commented Apr 12, 2018 • edited Loading

Findings

How did a random record get deleted?

Why didn't terraform give an error that it couldn't find the record after creating it?

Other improvements

Maxitems

Escaped ASCII characters

bflad left a comment

Choose a reason for hiding this comment

bflad Apr 12, 2018

Choose a reason for hiding this comment

bflad Apr 12, 2018

Choose a reason for hiding this comment

bflad commented Apr 18, 2018

ghost commented Apr 6, 2020

Crazybus commented Apr 12, 2018 •

edited

Loading