Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(aws_eks): New isolated subnets (CDK 1.140) leads to error in cdk's EKS logic #19122

Closed
ThomasSteinbach opened this issue Feb 23, 2022 · 5 comments · Fixed by #19279
Closed
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. p1

Comments

@ThomasSteinbach
Copy link
Contributor

ThomasSteinbach commented Feb 23, 2022

UPDATE 2022-03-04: This is definately a bug and I've got a solution. See the last chapter Workaround / Solution at the very end of my bug report.


What is the problem?

The problem is that the VPC and EKS construct doesn't work properly with the newly introduced aws_ec2.SubnetType.ISOLATED in CDK 1.140 ( see #18638) as well with the aws_ec2.SubnetType.PRIVATE_ISOLATED and aws_ec2.SubnetType.PRIVATE_WITH_NAT.

Problems especially occur on imported VPCs which just have "isolated" (formerly "private") subnets.

CDK v1

With CDK 1.140 former private networks were recognized as isolated networks - see #18638

As a result of this we had to change the parameter for aws_eks.Cluster(vpc_subnets) to [aws_ec2.SubnetSelection(subnet_type=aws_ec2.SubnetType.ISOLATED)].

However the new isolated subnets seem to collide with the aws_eks.Cluster(endpoint_access) value of aws_eks.EndpointAccess.PRIVATE:

File "/usr/local/lib/python3.7/site-packages/aws_cdk/aws_eks/__init__.py", line 11745, in __init__ jsii.create(self.__class__, self, [scope, id, props])
File "/usr/local/lib/python3.7/site-packages/jsii/_kernel/__init__.py", line 297, in create for iface in getattr(klass, "__jsii_ifaces__", [])
File "/usr/local/lib/python3.7/site-packages/jsii/_kernel/providers/process.py", line 344, in create return self._process.send(request, CreateResponse)
File "/usr/local/lib/python3.7/site-packages/jsii/_kernel/providers/process.py", line 326, in send raise JSIIError(resp.error) from JavaScriptError(resp.stack)
jsii.errors.JSIIError: Vpc must contain private subnets when public endpoint access is disabled

CDK v2

After migrating to CDK v2 we have changed the aws_ec2.SubnetType.ISOLATED (v1) to aws_ec2.SubnetType.PRIVATE_ISOLATED (v2). This is because the AWS cdk-cli will store our VPC subnets within the cdk.context.json as "isolated":

    "subnetGroups": [
      {
        "name": "Isolated",
        "type": "Isolated",
        "subnets": [
          {
            "subnetId": "subnet-00xxxxxxxxxxxxxxx",

Synthesizing the app leads to the exact same error as with the previous CDK v1 example:

jsii.errors.JavaScriptError:
  Error: Vpc must contain private subnets when public endpoint access is disabled
      at new Cluster (/private/var/folders/tv/9zml9td56ssfq30x0c39z1240000gn/T/jsii-kernel-OBsjs4/node_modules/aws-cdk-lib/aws-eks/lib/cluster.js:1:9002)

Reproduction Steps

  1. mkdir /tmp/bug && cd /tmp/bug
  2. cdk init --language=python
  3. Add a real environment to your app, so that you app will look like following:
app = cdk.App()
BugStack(app, "BugStack", env=cdk.Environment(region="eu-central-1", account="123456789000"))
  1. Add the imported vpc and the EKS cluster to the Stack ( bug/bug_stack.py):
class BugStack(Stack):

    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        vpc = aws_ec2.Vpc.from_lookup(self, "vpc", vpc_id="vpc-xxxxxxxxxxxxxxxx")

        print(f"#####--{id}")
        for sn in vpc.public_subnets:
            print(f"##### PUBLIC {sn.subnet_id}")
        for sn in vpc.private_subnets:
            print(f"##### PRIVATE {sn.subnet_id}")
        for sn in vpc.isolated_subnets:
            print(f"##### ISOLATED {sn.subnet_id}")

        aws_eks.Cluster(
                self,
                "cluster",
                version=aws_eks.KubernetesVersion.V1_20,
                endpoint_access=aws_eks.EndpointAccess.PRIVATE,
                vpc=vpc,
                vpc_subnets=[aws_ec2.SubnetSelection(subnet_type=aws_ec2.SubnetType.PRIVATE_ISOLATED)],
                )
  1. Doing a cdk synth should lead to the following error:
#####--<built-in function id>
##### PUBLIC s-12345
##### PUBLIC s-67890
##### PRIVATE p-12345
##### PRIVATE p-67890
jsii.errors.JavaScriptError:
  Error: Vpc must contain private subnets when public endpoint access is disabled
      at new Cluster (/private/var/folders/tv/9zml9td56ssfq30x0c39z1240000gn/T/jsii-kernel-cN6IVu/node_modules/aws-cdk-lib/aws-eks/lib/cluster.js:1:9002)
...

Oddness / "Workaround"

When changing the subnet type to PRIVATE_WITH_NAT...

vpc_subnets=[aws_ec2.SubnetSelection(subnet_type=aws_ec2.SubnetType.PRIVATE_WITH_NAT)],

... and running cdk synth again, the cdk-cli is doing a real lookup into our AWS account and will create a cdk.context.json with following content:

{
  "vpc-provider:account=xxxxxxxxxxxx:filter.vpc-id=vpc-xxxxxxxxxxxxxxxxx:region=eu-central-1:returnAsymmetricSubnets=true": {
    "vpcId": "vpc-xxxxxxxxxxxxxxxxx",
    "vpcCidrBlock": "10.105.28.0/22",
    "availabilityZones": [],
    "subnetGroups": [
      {
        "name": "Isolated",
        "type": "Isolated",
        "subnets": [
          {
            "subnetId": "subnet-xxxxxxxxxxxxxxxxx",
            "cidr": "10.105.28.0/24",
            "availabilityZone": "eu-central-1a",
            "routeTableId": "rtb-xxxxxxxxxxxxxxxxx"
          },
          {
            "subnetId": "subnet-xxxxxxxxxxxxxxxxx",
            "cidr": "10.105.29.0/24",
            "availabilityZone": "eu-central-1b",
            "routeTableId": "rtb-xxxxxxxxxxxxxxxxx"
          },
          {
            "subnetId": "subnet-xxxxxxxxxxxxxxxxx",
            "cidr": "10.105.30.0/24",
            "availabilityZone": "eu-central-1c",
            "routeTableId": "rtb-xxxxxxxxxxxxxxxxx"
          }
        ]
      }
    ]
  }
}

The synthsization itself will exit with following error:

(#####--<built-in function id>
##### PUBLIC s-12345
##### PUBLIC s-67890
##### PRIVATE p-12345
##### PRIVATE p-67890
#####--<built-in function id>
##### ISOLATED subnet-00e691f3dxxxxxxxx
##### ISOLATED subnet-0b8a1a22fxxxxxxxx
##### ISOLATED subnet-0b5a836b4xxxxxxxx
jsii.errors.JavaScriptError:
  Error: There are no 'Private' subnet groups in this VPC. Available types: Isolated
      at LookedUpVpc.selectSubnetObjectsByType (/private/var/folders/tv/9zml9td56ssfq30x0c39z1240000gn/T/jsii-kernel-0s3DjG/node_modules/aws-cdk-lib/aws-ec2/lib/vpc.js:1:4824)

Now an important behavior to mention: With this cdk.context.json present, when changing the subnet selection back to PRIVATE_ISOLATED the cdk synth will work!

That mean on the one hand CDK will fail to pass the subnet type checks of the EKS construct before doing the lookups, because CDK just passes "fake" public/private (private_with_nat) subnets to the EKS construct. If CDK reads the sunbnets from the cdk.context.json, cdk passes the proper isolated (private_isolated) subnets to EKS. On the other hand CDK fails to accept the isolated (private_isolated) subnets as "private" (private_with_nat) subnets in the VPC construct. That wrong logic definately needs to be fixed.

What did you expect to happen?

I'd expect to happen that the code will synthesize with ISOLATED(v1)/ PRIVATE_ISOLATED(v1) subnets as before with private subnets.
We changed nothing but the subnet selection from private to isolated networks, as the cdk lookup detects our networks as 'isolated' since cdk 1.140.

What actually happened?

cdk synth complains about

jsii.errors.JSIIError: Vpc must contain private subnets when public endpoint access is disabled

CDK CLI Version

3.13.0 (build b0b744d)

Framework Version

1.144 / 2.13.0

Node.js Version

v16.13.0

OS

Linux/Mac/Win

Language

Python

Language Version

Python 3.7, 3.8, 3.9, 3.10


UPDATES after new insights from 2022-03-04

Solution

The bug is, that on unresolved VPC lookups CDK uses a dummy VPC which doesn't contain isolated subnets.
So if you filter your subnets for any construct for only isolated subnets, the list of subnets returned is empy. This causes all the errors shown, as in my example of the EKS construct.
CDK returns this dummy VPC when either the cdk.context.json is not available or the lookup is not possible (e.g. you have no api access to your AWS account).

The solution for CDK developers should be to also add isolated subnets to the dummy VPC here (link for CDK v2.15.0):
https://github.com/aws/aws-cdk/blob/v2.15.0/packages/%40aws-cdk/aws-ec2/lib/vpc.ts#L2172

Please forgive me, that I didn't provide a merge request myself. I mainly use CDK in Python and never got into JS/TS.

Workaround (Python)

With the solution in mind we created a workaround, we created our custom VPC singleton whenever we got the dummy VPC. Our custom VPC includes isolated subnets. Because this dummy vpc is only used on unresolved VPC lookups, this VPC never gets into the real deployed stack, where the actual lookup will succeed.

    @property
    def imported_base_vpc(self) -> aws_ec2.IVpc:
        if self._imported_base_vpc is None:
            self._imported_base_vpc = aws_ec2.Vpc.from_lookup(
                self,
                "imported-base-vpc",
                vpc_id=dbv1_ssm.StringParameter.value_from_lookup(
                    self, (f"/{_account_application_name}/{dbv1_core.Stack.of(self).dbv1_environment}" f"/{_account_info_subsystem}/{_string_vpc_id}")
                ),
            )

        # AWS CDK has a severe bug. It introduced a new SubnetType.PRIVATE_ISOLATED, which is not included into the
        # temporary VPC cdk creates before doing lookups into the account, or when lookups not possible. Because we
        # use PRIVATE_ISOLATED subnets in our CDK constructs, we have to check for the presence/absence of this subnet
        # type and create a custom subnet, with PRIVATE_ISOLATED networks included.
        # see https://github.com/aws/aws-cdk/issues/19122
        if len(self._imported_base_vpc.isolated_subnets) == 0:
            singleton_id = "vpc_on_unresolved_lookup"
            stack = dbv1_core.Stack.of(self)
            retVal = stack.node.try_find_child(singleton_id)
            if not retVal:
                retVal = aws_ec2.Vpc(
                    stack,
                    singleton_id,
                    subnet_configuration=[
                        aws_ec2.SubnetConfiguration(name="public", subnet_type=aws_ec2.SubnetType.PUBLIC, cidr_mask=24),
                        aws_ec2.SubnetConfiguration(name="private_with_nat", subnet_type=aws_ec2.SubnetType.PRIVATE_WITH_NAT, cidr_mask=24),
                        aws_ec2.SubnetConfiguration(name="private_isolated", subnet_type=aws_ec2.SubnetType.PRIVATE_ISOLATED, cidr_mask=24),
                    ],
                )
            return cast(aws_ec2.Vpc, retVal)
        return self._imported_base_vpc
@ThomasSteinbach ThomasSteinbach added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Feb 23, 2022
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Feb 23, 2022
@ryparker ryparker added the p1 label Feb 23, 2022
@ryparker
Copy link
Contributor

Hey @ThomasSteinbach 👋🏻

Thanks for opening the issue with us.

Have you tried using the latest SubnetType API? PRIVATE_ISOLATED instead of ISOLATED (deprecated)

@ThomasSteinbach
Copy link
Contributor Author

ThomasSteinbach commented Mar 2, 2022

@ryparker , @otaviomacedo I have updated the bug description for also CDK v2.

Meanwhile we have migrated from CDK v1 to v2 and got the error in a different form with PRIVATE_ISOLATED subnets. And I clarified in the description that the error may depend on the VPC import/lookup.

I also added a complete, self contained example of how to reproduce the error. The only thing you need is an AWS account with a VPC containing PRIVATE_ISOLATED subnets only.

@ThomasSteinbach
Copy link
Contributor Author

Hi @otaviomacedo ,

as assignee to this issue I'd like to inform you that I have updated the bug description with a new section "Solution". There I've described how to fix this bug as CDK/TS developer.

In short: Simply add the newly introduced isolated subnets to the dummy VPC here:
https://github.com/aws/aws-cdk/blob/v2.15.0/packages/%40aws-cdk/aws-ec2/lib/vpc.ts#L2172

@otaviomacedo
Copy link
Contributor

Thanks, @ThomasSteinbach. I haven't had time to look into this yet, but rest assured that I'll do it this week. In the meantime, if you want to create a PR with your solution, go ahead and I'll be happy to review it.

@otaviomacedo otaviomacedo removed their assignment Mar 8, 2022
@mergify mergify bot closed this as completed in #19279 Mar 8, 2022
mergify bot pushed a commit that referenced this issue Mar 8, 2022
)

The lack of isolated subnet groups in the dummy response causes VPC lookups to fail.

Fixes #19122.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

github-actions bot commented Mar 8, 2022

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

TheRealAmazonKendra pushed a commit to TheRealAmazonKendra/aws-cdk that referenced this issue Mar 11, 2022
…#19279)

The lack of isolated subnet groups in the dummy response causes VPC lookups to fail.

Fixes aws#19122.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants