Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACME (LetsEncrypt) Provider Support #3599

Closed
apparentlymart opened this issue Oct 22, 2015 · 14 comments
Closed

ACME (LetsEncrypt) Provider Support #3599

apparentlymart opened this issue Oct 22, 2015 · 14 comments

Comments

@apparentlymart
Copy link
Contributor

Let's Encrypt and the ACME protocol are nearing release, so I wanted to think a little about how Terraform might interact with these.

The ACME protocol is interesting in that several of its operations require either manual operator intervention or dynamic management of other resources depending on responses from the server. For example, if the server requires DNS-based or HTTP-based domain verification, then completing that verification requires either creating a DNS record in a zone or adding a new resource to an HTTP server respectively. The verification process is also very dynamic, with the server dictating to the client which verifications are supported in which combinations.

Terraform's design isn't well-suited to that sort of thing, but I think Terraform can work with ACME in some other ways that would still be useful, as described in the following sections,

CA Account Registration

Terraform could manage registration, which is effectively creating an account with a CA.

resource "acme_account" "example" {
    // create/update using https://letsencrypt.github.io/acme-spec/#registration

    // The URL of the ACME server to register with
    // We'll use the directory endpoint https://letsencrypt.github.io/acme-spec/#rfc.section.6.3
    // to find the right URL to use for registration.
    server_url = "https://acme.example.com/"

    // a private key that will be used for authentication on subsequent requests
    account_key_pem = "..."

    // contact identifiers required by the server
    contact = [
        "mailto:[email protected]",
        "tel:+12025551212",
    ]
}

Registration requires a private key. The tls_private_key resource that'll be added by #2778 can potentially be used for this, with the drawback of the key being saved in the Terraform state, or the user can generate a key in a separate file and include it via file(...), thereby allowing it to be somehow securely provided to Terraform by surrounding orchestration.

There is a manual step here in that the server can require the user to agree to a terms-of-service before the account is activated. This could be accommodated in Terraform by having Terraform inspect the response for a terms-of-service link, and if so fail with an error (printing out the TOS URL) unless a separate attribute agree_to_tos is set. We can't force the user to actually read the TOS, but I think this complies with the spirit of the thing.

Creating a certificate for an identifier that was already authorized outside of Terraform

Identifier authorization is really the tricky part of ACME since it's pretty dynamic and the server can request a variety of different combinations of different verifications that we can't know until runtime.

However, if the authorization process has already completed via some other process then Terraform can be used to actually issue certificates, since that's a much more predictable process:

resource "acme_certificate" "example" {
    // create using https://letsencrypt.github.io/acme-spec/#rfc.section.6.6
    // delete using https://letsencrypt.github.io/acme-spec/#certificate-revocation

    // The URL of the ACME server to register with
    // We'll use the directory endpoint https://letsencrypt.github.io/acme-spec/#rfc.section.6.3
    // to find the right URL to use for certificate requests.
    server_url = "https://acme.example.com/"

    // A PEM-formatted CSR payload
    cert_request_pem = "..."

    // A PEM-formatted account private key used to authenticate the request
    // (this is the same key used with the registration resource described above)
    account_key_pem = "..."

    lifecycle {
        // Make sure we don't revoke our old cert before we have a replacement
        create_before_destroy = true
    }
}

This resource requires a PEM-formatted certificate request. One way to create that would be to use the tls_cert_request resource that will be added by #2778.

The ACME protocol allows the server to process such a request asynchronously, so Terraform would need to poll the certificate URL returned from the initial request until a certificate becomes available there.

It can then produce the following attributes, to match with the tls_self_signed_cert resource added by #2778:

  • cert_pem - The certificate data in PEM format.
  • ``validity_start_time` - The time after which the certificate is valid, as an RFC3339 timestamp.
  • validity_end_time - The time until which the certificate is invalid, as an RFC3339 timestamp.

Ideally this resource would support the same early_renewal_hours parameter I implemented for tls_self_signed_cert so that Terraform can pre-emptively request a new certificate some time before the existing cert expires, for a graceful transition between certificates.

Automating Identifier Authorization with a Provisioner

Although verification can't (easily) be completely handled by Terraform, we can have Terraform orchestrate the process by creating the verification request and then running a provisioner to complete it:

resource "acme_identifier_authorization" "example" {
    server_url = "https://acme.example.com/"

    type = "dns"
    value = "example.org"

    provisioner "local-exec" {
        // This script presumably knows how to unpack the server-sent challenges
        // and act on some or all of them to complete the verification process.
        // If it can't do so then it can exit non-successfully and then terraform apply
        // will fail, tainting this resource so we'll retry next time.
        command = "deal-with-acme-challenges '${self.json}'"
    }
}

resource "tls_cert_request" "example" {
    subject {
        // Create a dependency on the completion of the authorization.
        common_name = "${acme_identifier_authorization.example.value}"
        // ...
    }
    // ...
}

resource "acme_certificate" "example" {
    server_url = "https://acme.example.com/"
    cert_request_pem = "${tls_cert_request.example.cert_request_pem}"
    account_key_pem = "..."
}

In the above example, ${self.json} is assumed to return a compact JSON serialization of the authorization response so it can be easily passed into a tool that understands how to decode it and act on some or all of the challenges described.

Separate provider or integrate into TLS?

The TLS provider proposed in #2778 is a logical-only provider. The above examples assume a new provider called "acme" that contains these new ACME-specific resources.

However, the above resources are defined such that each resource is self-contained and the provider itself takes no configuration. Thus these resources could potentially just be folded into the TLS provider, making it be a suite of resources for working with TLS resources of various kinds.

If the above resources were in the TLS provider they could instead be called tls_registration_account and tls_certificate, with the server_url attribute changing to acme_server_url:

resource "tls_registration_account" "example" {
    acme_server_url = "https://acme.example.com/"
    // ...
}
resource "tls_certificate" "example" {
    acme_server_url = "https://acme.example.com/"
    // ...
}

Since ACME is largely an implementation detail of the Let's Encrypt offering but will hopefully be implemented by many different CAs in future, I'm leaning towards folding these resources into the tls provider so that they are more discoverable for users who might not know the details of the underlying protocol. The docs for these resources can then talk a little about how the ACME protocol is used, and give the correct acme_server_url value for the Let's Encrypt ACME server, which is likely to be the most common server people will use.

@mitchellh
Copy link
Contributor

I'm a big +1 for this. I think having "acme" as a separate provider is cleaner and gives us flexibility to change interfaces for each.

@sethvargo
Copy link
Contributor

Looks like this is a standalone plugin now: https://github.com/paybyphone/terraform-provider-acme

We should consider moving this into core given the popularity of LE.

@apparentlymart
Copy link
Contributor Author

@sethvargo there was discussion about that in #7058. See there for more context, but the two key points were:

  • @vancluever's implementation uses a particulary "heavy" Let's Encrypt client that duplicates many of the clients already in Terraform. It felt like we should find a way to make this sit better in Terraform's architecture, since it's confusing to have e.g. a second Route53 client that doesn't respect the settings on Terraform's AWS provider.
  • Let's Encrypt is currently shifting quite a lot in its IETF standardization process, to the point where the fundamental workflow is changing. It's likely that the existing IETF-driven implementation won't go anywhere for a while, but it felt better to let the IETF-driven reorganization settle a bit first so that we can come up with a design that would work for both.

Possibly the growing popularity of Let's Encrypt warrants a change in strategy here, but at that time I'd figured that having it as an external provider was a good compromise to make the functionality available while giving us an opportunity to find a more Terraform-idiomatic implementation of the workflow.

@vancluever
Copy link
Contributor

Wow, speak of the devil, I was thinking about this today! Also on and off for the last little bit.

I've been trying to brainstorm how we could make ACME in its current form a true TF-centric workflow. As there's been much said already here and in #7058, I'll get right to it:

The challenge with making ACME play nicely with Terraform in its current state is how to deal with authorizations. They are multi-state resources that need to be called back to from whatever is used to handle one of the challenges (essentially a sub-resource of authorizations) issued by the CA to validate the authorization.

As such it makes TF workflow kind of tricky, without certain current concept-only stage features such as conditionals and partial applies.

The ACME spec as LE currently implements it follows this workflow for certificate generation.

  • An authorization is requested (new-authz). This produces challenges that the client can choose to answer (ie: HTTP, TLS, DNS)
  • The client does the needful to effect at least one of the challenges.
  • The client then POSTs to the authz resource generated by new-authz, with the challenges they have answered.
  • Once the authz's status has moved to valid, the client can move on to requesting the certificate. This part is straightforward once the authorization has been obtained.

Now, possibly, we could do this right now by breaking up the authorization resource into two resources, one for each stage. Example with say, Route 53 for DNS challenges (plausibly the majority of TF scenarios):

acme_authorization -> aws_route53_record -> acme_challenge -> acme_certificate

As a breakdown:

  • acme_authorization would be the real authz resource and store the authorization in its entirety. This would help future-proof things for when TF (or even ACME for that matter) has a workflow that would allow this to be the only resource needed to manage authorizations (if authorizations in their current form are even around at that point).
  • acme_challenge would be a resource that manages a single challenge, obtained from the authorization and chained from the challenge data if Terraform was handling it (so either a soft or hard dependency on something like aws_route53_record for example). This resource would need to duplicate some data from the authorization to be effective, such as the authorization status, so that the certificate resource could depend on it somehow to ensure that the certificate could be generated.
  • acme_certificate would have either a soft (ie: via depends_on) or a hard dependency on acme_challenge. The certificate generation part is pretty straightforward, it's just a matter of being authorized for the domains that you want in the certificate.

If everyone thinks that this is still something worth pursuing, let me know. I can get started on:

  • Writing a granular, API-only ACME SDK to support what we neeed
  • Writing only the ACME-related parts for TF (no DNS or HTTP challenge plugins).

@vancluever
Copy link
Contributor

vancluever commented May 18, 2017

I just wanted to document some of the brainstorming that I've been doing on this one for the last few days. It's a bit messy, but I think it gets the job done in as sane of a way as possible.

  • Right now, I am kind of leaning towards golang.org/x/crypto/acme as a low-level library. It's bare bones, but gets the job done in a way that we will be able to use. Not too sure if I missed this last year, but yeah. ;)

The authorization process will be pretty de-centralized, having to rely on waiters in some parts to make sure things are okay to proceed:

  • acme_authorization will flow into any companion DNS resources, acme_challenge, and acme_certificate all on the same level of the graph.
  • acme_challenge will check for DNS challenges and will wait for DNS propagation before POSTing to the challenge that it can proceed.
  • acme_certificate will accept all authorizations that are in the CN/SAN list of the certificate, and will wait and validate that all authorizations are valid before proceeding with fetching the certificate.

This reduces reliance on depends_on (although the user would be more than welcome to do so if there was an issue with how the provider behaves - although a bug report would probably be prudent too).

Finally, we would converge on using resources in the TLS provider to do the crypto. No private key generation or request creation would be done by the ACME provider.

A sample pseudo-config can be found in this gist.

The only real outstanding issue I can really see is cleanup of any DNS records. These tokens are not necessary to keep around after authorization so it kind of feels a bit sketchy to leave them around after the fact. Judging from what I've been reading, it's not a big deal if they are, but you kind of expect automation to handle these kinds of things. Any ideas on this one are welcome!

@apparentlymart
Copy link
Contributor Author

Thanks for the continued thinking on this, @vancluever!

I like this new model a lot. The idea of a resource that "hangs" until some other action is completed is rather unconventional, so I think we'd need some careful documentation for that and some good examples, but it does seem like the approach that makes the best use of Terraform's existing resources.

I also like the allow_challenges attribute on acme_authorization, which would presumably allow us to fail if the server asks for something else. I wonder if we should instead call it expect_challenges and make it stricter, failing if the set of challenges doesn't exactly match what's expected, so we can avoid situations where e.g. the ACME server asks for both DNS and email verification for some reason, but the Terraform config is expecting only DNS.

How is IETF standardization going? Do you think things have settled enough now that we could commit to this interface and not get broken by future refactoring of the workflow? (I notice the Go library itself explicitly disclaims any API compatibility promises, which doesn't bode well. 😀)

@vancluever
Copy link
Contributor

Thanks @apparentlymart!

Yeah, the waiter thing is really the only way I can think of getting around a lack of partial applies, and also the fact that tooling should asset that DNS propagation has taken place to reduce the risk of the authorization failing due to a race condition. Definitely we should be ensuring that documentation is really clear on what's happening under the hood, and we can also use debug logs to help with verbosity to assist with any necessary troubleshooting.

I've updated the gist so that allow_challenges is now expect_challenges. The object of this setting is pretty much as you guessed it was - it's to ensure that Terraform will be properly configured to handle all challenges that it expects to be able to reasonably answer. The most obvious use case would be ensuring that one of either a DNS or HTTP/TLS challenge is required, but not both. The resource will properly check the combinations in the authorization resource as well to ensure that all challenges referenced in expect_challengesare valid together to solve a challenge (example: if dns-01 and oob-01 were both required as referenced in a combination, but only dns-01 was supplied to expect_challenges, Terraform should fail).

As for how the standardization is going - I think we are pretty much in the same boat as we were last year, funny enough. I haven not had to make any real major changes to the standalone provider in any fashion that seriously changes how the provider workflow works, and most of the research I've been doing for the above draft has not indicated any serious change. In fact, Boulder has kind of diverged from the ACME standard. My assessment is, being that Boulder is probably the closest thing to a reference implementation for an ACME CA, and I don't think that Let's Encrypt will be introducing any sort of changes that would ultimately break the ecosystem that now exists around the product, is following LE's implementation of ACME (and probably what Google is tracking too in golang.org/x/crypto/acme) is probably safe.

@mattes
Copy link

mattes commented Jun 5, 2017

Would golang.org/x/crypto/acme take care of the cert renewals?

@vancluever
Copy link
Contributor

Hey @mattes, right now a certificate renewal is actually just requesting a new certificate, which is basically how it's handled in lego (what the external plugin uses). So there's not necessarily a renewal, if you will.

How this translates to Terraform is pretty simple - my plan is to preserve the min_days_remaining option that exists in the external plugin, and rather than calling any renewal behaviour, just either taint the certificate - or, if we can come to a consensus on #14887 (this is now delayed probably after v0.10 amongst concerns of the safety of altering the diff right now), force the certificate PEM back to a comupted state in the diff so that the next update generates it again.

Either way the plan is to make sure that the provider can handle renewals automatically - as this is crucial to a useful ACME client implementation. The more minimal API over lego just means that Terraform will have to handle a bit more of the logic, and will be a lot more granular - but that has been the plan for a while.

@Nowaker
Copy link

Nowaker commented Jul 18, 2017

Would this become part part of https://github.com/terraform-providers/terraform-provider-tls or be a provider on its own?

@vancluever
Copy link
Contributor

Hey @Nowaker, unless something changes, this would be its own provider (the acme provider). This is logically the way that the plugin has been set up and I don't see much reason to change it now, especially with things like configurable settings like the registration URL that don't really make sense existing in tls at a provider level.

@ozbillwang
Copy link

Seems this issue can be closed, because its related PR (#7058) has been resolved (close and agree to be standalone plugin).

@apparentlymart
Copy link
Contributor Author

A good point, @ozbillwang. This issue represented an early proposal/discussion and there's now a concrete implementation and so this issue's usefulness is limited anyway. Furthermore, even if this provider were adopted into the set that HashiCorp distributes that would now be done in its own repository rather than in this one.

Thanks for the nudge! I'm going to close this. For the moment, anyone who wants to use ACME (Let's Encrypt) with Terraform should take a look at the third-party plugin.

@ghost
Copy link

ghost commented Apr 7, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants