Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aurora Global cluster timeout and update-in-place all the time #10150

Closed
jamengual opened this issue Sep 18, 2019 · 4 comments
Closed

Aurora Global cluster timeout and update-in-place all the time #10150

jamengual opened this issue Sep 18, 2019 · 4 comments
Labels
bug Addresses a defect in current functionality. service/rds Issues and PRs that pertain to the rds service.

Comments

@jamengual
Copy link

jamengual commented Sep 18, 2019

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

0.12.7

Affected Resource(s)

  • aws_rds_cluster
  • aws_rds_global_cluster

Terraform Configuration Files

provider "aws" {
  alias  = "primary"
  region = "us-east-2"
  # Make it faster by skipping some checks
  skip_get_ec2_platforms      = true
  skip_metadata_api_check     = true
  skip_region_validation      = true
  skip_credentials_validation = true
  skip_requesting_account_id  = true
}

provider "aws" {
  alias  = "secondary"
  region = "us-west-2"

  # Make it faster by skipping some checks
  skip_get_ec2_platforms      = true
  skip_metadata_api_check     = true
  skip_region_validation      = true
  skip_credentials_validation = true
  skip_requesting_account_id  = true
}

resource "aws_rds_global_cluster" "main" {
  engine_version = "5.6.10a"
  global_cluster_identifier = "main-global-cluster"
  storage_encrypted = true
  provider = aws.primary
}

module "main_primary_cluster" {
  #source          = "git::https://github.com/cloudposse/terraform-aws-rds-cluster.git?ref=0.16.0"
  source = "../terraform-aws-rds-cluster"
  engine          = "aurora"
  engine_version = "5.6.10a"
  cluster_family  = "aurora5.6"
  cluster_size    = var.cluster_size
  namespace       = var.namespace
  stage           = var.stage
  name            = var.main_name
  admin_user      = var.db_user
  admin_password  = random_string.db_password.result
  db_name         = var.main_db_name
  instance_type   = "db.r5.xlarge"
  vpc_id          = local.vpc_id
  security_groups = [aws_security_group.main_sg.id]
  subnets         = local.private_subnet_ids
  engine_mode               = "global"
  global_cluster_identifier = "${aws_rds_global_cluster.main.id}"
  iam_database_authentication_enabled = true
  storage_encrypted = true

  # enable monitoring every 30 seconds
  rds_monitoring_interval = 15

  # reference iam role created above
  rds_monitoring_role_arn = aws_iam_role.main_enhanced_monitoring.arn
  performance_insights_enabled = false
  # performance_insights_kms_key_id = module.kms_key.key_arn
  

  cluster_parameters = [
    {
      name  = "binlog_format"
      value = "row"
      apply_method = "pending-reboot"
    },
    {
       apply_method = "immediate"
      name         = "max_allowed_packet"
     value        = "16777216"
        }
  ]
  providers = {
    aws = aws.primary
  }
}

module "main_secondary_cluster" {
  #source          = "git::https://github.com/cloudposse/terraform-aws-rds-cluster.git?ref=0.16.0"
  source = "../terraform-aws-rds-cluster"
  engine          = "aurora"
  engine_version = "5.6.10a"
  cluster_family  = "aurora5.6"
  cluster_size    = var.cluster_size
  namespace       = var.namespace
  stage           = var.stage
  name            = "${var.main_name}_secondary"
  admin_user      = ""
  admin_password  = ""
  db_name         = ""
  instance_type   = "db.r5.large"
  vpc_id          = local.secondary_vpc_id
  security_groups = [aws_security_group.secondary_main_sg.id]
  subnets         = local.secondary_private_subnet_ids
  engine_mode               = "global"
  global_cluster_identifier = "${aws_rds_global_cluster.main.id}"
  iam_database_authentication_enabled = true
  kms_key_arn = data.aws_kms_key.kms_key.arn
  source_region = "us-east-2"
storage_encrypted = true

  # enable monitoring every 30 seconds
  rds_monitoring_interval = 30

  # reference iam role created above
  rds_monitoring_role_arn = aws_iam_role.main_enhanced_monitoring.arn
  performance_insights_enabled = false
  #performance_insights_kms_key_id = module.kms_key.key_arn
  
  

  cluster_parameters = [
    {
      name  = "binlog_format"
      value = "row"
      apply_method = "pending-reboot"
    },
    {
       apply_method = "immediate"
      name         = "max_allowed_packet"
     value        = "16777216"
        }
  ]
  providers = {
    aws = aws.secondary
  }
}

...

Debug Output

https://gist.github.com/jamengual/3b44ec91777090dea73c3957b87aae9f

Expected Behavior

The aurora clusters that joined the global cluster should not require any modifications

Actual Behavior

Every time that apply is run the Aurora cluster members of the global cluster wants to modify replication_source_identifier
so this should be ignored.

Steps to Reproduce

  1. terraform apply -target module.main_primary_cluster -var cluster_size=0
  2. terraform apply -target module.main_primary_cluster -var cluster_size=2
  3. terraform apply -target module.main_secondary_cluster -var cluster_size=0
  4. terraform apply -target module.main_secondary_cluster -var cluster_size=2
  5. terraform apply

Important Factoids

Timeouts

Tested in different regions, different instances size to no avail, the timeouts are happening 90% of the time if cluster_size=2, I tried
different internet connections and such thinking it was a problem in my setup.

The replication_source_identifier update-in-place happens 100% of the time.

I created my own fork of the clousposse module to add the global support that is basically 2 line
change so there is no hidden magic or loops going one here.

  • #0000
@ghost ghost added the service/rds Issues and PRs that pertain to the rds service. label Sep 18, 2019
@github-actions github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Sep 18, 2019
@anGie44 anGie44 added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Jul 31, 2020
@anGie44 anGie44 self-assigned this Jul 31, 2020
@anGie44 anGie44 changed the title Autora Global cluster timeout and update-in-place all the time Aurora Global cluster timeout and update-in-place all the time Aug 6, 2020
@anGie44
Copy link
Contributor

anGie44 commented Aug 6, 2020

Hi @jamengual, thank you for submitting this issue and apologies you've run into this issue! From the logs, it looks like you're referring to the non-empty plan that results after creation of the rds_cluster as the replication_source_identifier attribute is returned from the API even though the module's variable looks to be unconfigured in your example. We've seen similar issues reported related to these replication_source_identifier and global_cluser_identifier attributes when an rds_cluster resource refers to an rds_global_cluster. As a workaround in the meantime, I would first suggest to update the module source code (if possible) to use the lifecycle configuration block ignore_changes around this param (in the resource definition; unfortunately, this isn't feasible at the module level yet) to avoid the perpetual updates e.g.

lifecycle {
  ignore_changes = [replication_source_identifier]
}

On our end, we'll look into marking this attribute as Computed to address the diff you are seeing on each apply.

@anGie44 anGie44 removed their assignment Sep 29, 2021
@anGie44
Copy link
Contributor

anGie44 commented Sep 29, 2021

Hi @jamengual , since it's been some time since opening this issue and a newer terraform/provider/module version may address this, I'm going to close this for the time being. Please do reach out if there are any new findings with later versions of the provider.

@anGie44 anGie44 closed this as completed Sep 29, 2021
@jamengual
Copy link
Author

jamengual commented Sep 29, 2021

Thanks @anGie44 , I did not reply before but I did use the workaround and since then I have not used it again in the new provider versions, but it is ok to close the issue

@github-actions
Copy link

github-actions bot commented Jun 5, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/rds Issues and PRs that pertain to the rds service.
Projects
None yet
Development

No branches or pull requests

2 participants