-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure route53 records really exist after creation and add support for all ASCII characters #4183
Conversation
matches the variable name
verifying if a record exists after creating/updating since resourceAwsRoute53RecordRead does not return an error if the record does not exist
to avoid downloading 100 records for every request Compare the response records with the original record name instead of using the StartRecordName. In events where the record wasn't matching correctly StartRecordName would be updated on the next page and it would then always match the next record (record 101) leading to accidental deletions of the wrong record
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Crazybus thank you very much for the detailed comments and logic enhancement here. Both are extremely appreciated! 🚀
24 tests passed (all tests)
=== RUN TestAccAWSRoute53Record_alias
--- PASS: TestAccAWSRoute53Record_alias (157.67s)
=== RUN TestAccAWSRoute53Record_txtSupport
--- PASS: TestAccAWSRoute53Record_txtSupport (163.68s)
=== RUN TestAccAWSRoute53Record_importUnderscored
--- PASS: TestAccAWSRoute53Record_importUnderscored (182.63s)
=== RUN TestAccAWSRoute53Record_basic
--- PASS: TestAccAWSRoute53Record_basic (183.94s)
=== RUN TestAccAWSRoute53Record_caaSupport
--- PASS: TestAccAWSRoute53Record_caaSupport (188.77s)
=== RUN TestAccAWSRoute53Record_failover
--- PASS: TestAccAWSRoute53Record_failover (191.23s)
=== RUN TestAccAWSRoute53Record_s3_alias
--- PASS: TestAccAWSRoute53Record_s3_alias (193.12s)
=== RUN TestAccAWSRoute53Record_generatesSuffix
--- PASS: TestAccAWSRoute53Record_generatesSuffix (221.00s)
=== RUN TestAccAWSRoute53Record_TypeChange
--- PASS: TestAccAWSRoute53Record_TypeChange (233.38s)
=== RUN TestAccAWSRoute53Record_spfSupport
--- PASS: TestAccAWSRoute53Record_spfSupport (235.53s)
=== RUN TestAccAWSRoute53Record_latency_basic
--- PASS: TestAccAWSRoute53Record_latency_basic (236.09s)
=== RUN TestAccAWSRoute53Record_geolocation_basic
--- PASS: TestAccAWSRoute53Record_geolocation_basic (237.92s)
=== RUN TestAccAWSRoute53Record_weighted_basic
--- PASS: TestAccAWSRoute53Record_weighted_basic (241.35s)
=== RUN TestAccAWSRoute53Record_AliasChange
--- PASS: TestAccAWSRoute53Record_AliasChange (250.92s)
=== RUN TestAccAWSRoute53Record_wildcard
--- PASS: TestAccAWSRoute53Record_wildcard (260.54s)
=== RUN TestAccAWSRoute53Record_aliasUppercase
--- PASS: TestAccAWSRoute53Record_aliasUppercase (271.41s)
=== RUN TestAccAWSRoute53Record_SetIdentiferChange
--- PASS: TestAccAWSRoute53Record_SetIdentiferChange (277.56s)
=== RUN TestAccAWSRoute53Record_basic_fqdn
--- PASS: TestAccAWSRoute53Record_basic_fqdn (277.91s)
=== RUN TestAccAWSRoute53Record_importBasic
--- PASS: TestAccAWSRoute53Record_importBasic (279.52s)
=== RUN TestAccAWSRoute53Record_empty
--- PASS: TestAccAWSRoute53Record_empty (152.85s)
=== RUN TestAccAWSRoute53Record_multivalue_answer_basic
--- PASS: TestAccAWSRoute53Record_multivalue_answer_basic (161.80s)
=== RUN TestAccAWSRoute53Record_longTXTrecord
--- PASS: TestAccAWSRoute53Record_longTXTrecord (182.12s)
=== RUN TestAccAWSRoute53Record_weighted_alias
--- PASS: TestAccAWSRoute53Record_weighted_alias (385.36s)
=== RUN TestAccAWSRoute53Record_allowOverwrite
--- PASS: TestAccAWSRoute53Record_allowOverwrite (206.01s)
} | ||
|
||
log.Printf("[DEBUG] List resource records sets for zone: %s, opts: %s", | ||
zone, lopts) | ||
|
||
var record *route53.ResourceRecordSet | ||
|
||
// We need to loop over all records starting from the record we are looking for because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ thank you!
name := cleanRecordName(*recordSet.Name) | ||
if FQDN(strings.ToLower(name)) != FQDN(strings.ToLower(*lopts.StartRecordName)) { | ||
|
||
responseName := strings.ToLower(cleanRecordName(*recordSet.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nitpick: To prevent crashes in the very unlikely scenario that AWS returns nil for either of these, it would be safer to wrap them in aws.StringValue()
👍 e.g. aws.StringValue(recordSet.Name)
This has been released in version 1.15.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks! |
In #3579 we found some weird behaviour where records starting with
@
were always showing up as a change for everyterrafrom apply
. The weird part about this was that the record was being created correctly and not showing up in state. In #3579 (comment) @bflad pointed out that@
doesn't need to be used with AWS and they actually give you a warning when you try to use it inside the AWS console. With this answer I was happy and able to move these@
records to the apex domain and move on with my life. However I was still very curious why the records weren't showing up in state properly.Fast forward a few days and I submitted a job to move the values of the
@
records to the apex domains for each zone. It worked! However a little while later someone reported an unrelated DNS record was no longer resolving. I had only just recently moved all of our production DNS into terraform so I was worried that the automation for deploying changes had accidentally deleted something. In #3579 (comment) you can see more detail about what actually happened.Luckily all DNS changes are now automated by a jenkins job running terraform. Looking at the Route53 logs I could see that the unrelated record was removed exactly when terraform was removing the
@
record. Another terraform run fixed things up and added the record back but I felt that I needed to figure out what happened. Luckily I was able to reproduce this locally however I still hadn't explained why it only happened for 1 of the 3 zones which had@
records.Findings
How did a random record get deleted?
I figured out very quickly that this was only happening for zones which had more than 100 records (the default maxItems per page for the AWS api). I noticed this when I tried to reproduce it with a limited dataset and couldn't trigger the issue. So my focus shifted to the code which was looping of the paginated results from AWS.
The root cause was the line https://github.com/terraform-providers/terraform-provider-aws/blob/c77020f539eed9d3f1dab36c2d2d5a624ed3007b/aws/resource_aws_route53_record.go#L646
So the current record in the response was being compared to the
*lopts.StartRecordName
. Which for the first page of results was the correct record. However if the first result didn't match for some reason (In my case because@.domain.com
!=\\100.domain.com
it would continue looping until it got to the next page. At this stage*lopts.StartRecordName
now was set to the first result in the second page of results and then the record matched perfectly!So this bug would only be triggered if the record being deleted didn't match with how AWS returned the record, and if there were more than 100 records in the zone.
P.S. I tried to write an acceptance test for this behaviour but with my other fixes I now can't trigger a scenario where terraform can create a record it can't find.
Why didn't terraform give an error that it couldn't find the record after creating it?
The above wouldn't have been possible if terraform had correctly detected that it couldn't read the record it had created. When creating and updating records the final line is https://github.com/terraform-providers/terraform-provider-aws/blob/c77020f539eed9d3f1dab36c2d2d5a624ed3007b/aws/resource_aws_route53_record.go#L454
Since this function is also used for checking if a record exists or not it silently will drop the record and remove it from state if the record doesn't exist.
https://github.com/terraform-providers/terraform-provider-aws/blob/c77020f539eed9d3f1dab36c2d2d5a624ed3007b/aws/resource_aws_route53_record.go#L520-L530
So what was happening is terraform was trying to create
@.domain.com
, after successfully creating it then calledresourceAwsRoute53RecordRead
which didn't return an error so Terraform didn't give an error.Other improvements
Maxitems
My first reaction when seeing the looping over the first 100 results was....why? I then discovered the reason and added it as an inline comment so nobody else gets confused in the future. I have now optimised this to only return a single result if we know that there can be only one match. This saves terraform needing to pull in 100 resources everytime it wants to check if a single record exists.
Escaped ASCII characters
AWS supports a lot more special characters apart from
*
and@
which I have now added support for. You can see a full list at the Route53 Domain Name Format docs. I decided to just convert all of them instead of trying to maintain a list of supported characters. If a user does somehow pick a character that doesn't convert correctly they will now correctly receive an error when terraform tries to read back the record after creating it.