Pitfalls of Terraform

image

We’ll highlight several pitfalls, including those related to loops, if statements, and deployment techniques, as well as more general issues that concern Terraform as a whole:

  • the count and for_each parameters have limitations;
  • Zero downtime deployment restrictions
  • even a good plan may fail;
  • refactoring can have its tricks;
  • deferred consistency is consistent ... with deferral.

Count and for_each parameters have limitations


In the examples in this chapter, the count parameter and the for_each expression are actively used in loops and conditional logic. They perform well, but they have two important limitations that you need to know about.

  • In count and for_each, no resource output variables can be referenced.
  • count and for_each cannot be used in module configuration.

Count and for_each cannot reference any output variables of a resource


Imagine you need to deploy multiple EC2 servers and for some reason you do not want to use ASG. Your code might be like this:

resource "aws_instance" "example_1" {
   count             = 3
   ami                = "ami-0c55b159cbfafe1f0"
   instance_type = "t2.micro"
}

We will consider them in turn.

Since the count parameter is assigned a static value, this code will work without problems: when you run the apply command, it will create three EC2 servers. But if you wanted to deploy one server in each availability zone (Availability Zone or AZ) within the current AWS region? You can have your code load the list of zones from the aws_availability_zones data source and then cycle through each of them and create an EC2 server in it using the count parameter and accessing the array by index:

resource "aws_instance" "example_2" {
   count                   = length(data.aws_availability_zones.all.names)
   availability_zone   = data.aws_availability_zones.all.names[count.index]
   ami                     = "ami-0c55b159cbfafe1f0"
   instance_type       = "t2.micro"
}

data "aws_availability_zones" "all" {}

This code will also work just fine, as the count parameter can reference data sources without any problems. But what happens if the number of servers you need to create depends on the output of some resource? To demonstrate this, the easiest way is to take the random_integer resource, which, as you might guess from the name, returns a random integer:

resource "random_integer" "num_instances" {
  min = 1
  max = 3
}

This code generates a random number from 1 to 3. Let's see what happens if we try to use the result output of this resource in the count parameter of the aws_instance resource:

resource "aws_instance" "example_3" {
   count             = random_integer.num_instances.result
   ami                = "ami-0c55b159cbfafe1f0"
   instance_type = "t2.micro"
}

If you execute terraform plan for this code, you get the following error:

Error: Invalid count argument

   on main.tf line 30, in resource "aws_instance" "example_3":
   30: count = random_integer.num_instances.result

The "count" value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many instances will be created. To work around this, use the -target argument to first apply only the resources that the count depends on.

Terraform requires that count and for_each be computed during the planning phase, before any resources are created or modified. This means that count and for_each can refer to literals, variables, data sources, and even lists of resources (provided that their length can be determined during planning), but not to computed output variables of the resource.

count and for_each cannot be used in module configuration


Sometime you may be tempted to add the count parameter to the module configuration:

module "count_example" {
     source = "../../../../modules/services/webserver-cluster"

     count = 3

     cluster_name = "terraform-up-and-running-example"
     server_port = 8080
     instance_type = "t2.micro"
}

This code attempts to use count inside the module to create three copies of the webserver-cluster resource. Or, maybe you want to make the module connection optional depending on some Boolean condition, assigning its count parameter to 0. This code will look quite reasonable, but as a result of executing the terraform plan you will get the following error:

Error: Reserved argument name in module block

   on main.tf line 13, in module "count_example":
   13: count = 3

The name "count" is reserved for use in a future version of Terraform.

Unfortunately, at the time Terraform 0.12.6 was released, the use of count or for_each in the module resource was not supported. According to Terraform 0.12 release notes (http://bit.ly/3257bv4), HashiCorp plans to add this feature in the future, so depending on when you read this book, it may already be available. To find out for sure, check out the Terraform change log here .


Using the create_before_destroy block in combination with ASG is an excellent solution for organizing deployments with zero downtime, except for one nuance: autoscaling rules are not supported. Or, to be more precise, this resets the ASG size back to min_size on every deployment, which can be a problem if you used autoscaling rules to increase the number of running servers.

For example, the webserver-cluster module contains a pair of aws_autoscaling_schedule resources, which at 9 am increase the number of servers in the cluster from two to ten. If you deploy, say, at 11 a.m., the new ASG will not boot with ten, but with only two servers, and will remain in this state until 9 a.m. the next day.

This limitation can be circumvented in several ways.

  • Change the recurrence parameter in aws_autoscaling_schedule from 0 9 * * * (“run at 9am”) to something like 0-59 9-17 * * * (“run every minute from 9am to 5pm”). If the ASG already has ten servers, re-executing this autoscale rule will not change anything, which is what we need. But if the ASG group has been recently deployed, this rule ensures that in a minute the number of its servers reaches ten. This is not quite an elegant approach, and large jumps from ten to two servers and back can also cause problems for users.
  • Create a user-defined script that uses the AWS API to determine the number of active servers in the ASG, call it using an external data source (see section "External data source" on page 249), and set the desired_capacity parameter of the ASG group to the value returned by this script. Thus, each new ASG instance will always run with the same capacity as the old Terraform code and complicates its maintenance.

Of course, ideally, Terraform should have built-in support for deployments with zero downtime, but as of May 2019, the HashiCorp team did not plan to add this functionality ( details are here ).

The correct plan may be unsuccessfully implemented.


Sometimes when you execute the plan command, you get a completely correct deployment plan, but the apply command returns an error. For example, try adding the aws_iam_user resource with the same name that you used for the IAM user that you created earlier in chapter 2:

resource "aws_iam_user" "existing_user" {
   #       IAM,
   #      terraform import
   name = "yevgeniy.brikman"
}

Now, if you execute the plan command, Terraform will display at first glance a very reasonable deployment plan:

Terraform will perform the following actions:

   # aws_iam_user.existing_user will be created
   + resource "aws_iam_user" "existing_user" {
         + arn                  = (known after apply)
         + force_destroy   = false
         + id                    = (known after apply)
         + name               = "yevgeniy.brikman"
         + path                 = "/"
         + unique_id         = (known after apply)
      }

Plan: 1 to add, 0 to change, 0 to destroy.

If you execute the apply command, you get the following error:

Error: Error creating IAM User yevgeniy.brikman: EntityAlreadyExists:
User with name yevgeniy.brikman already exists.

   on main.tf line 10, in resource "aws_iam_user" "existing_user":
   10: resource "aws_iam_user" "existing_user" {

The problem, of course, is that an IAM user with that name already exists. And this can happen not only with IAM users, but also with almost any resource. It is possible that someone created this resource manually or using the command line, but be that as it may, matching identifiers leads to conflicts. There are many flavors to this error that often take the Terraform beginners by surprise.

The key point is that the terraform plan command only considers resources that are specified in the Terraform status file. If resources are created in some other way (for example, manually, by clicking on the AWS console), they will not get into the status file and, therefore, Terraform will not take them into account when executing the plan command. As a result, the plan that is correct at first glance will be unsuccessful.

Two lessons can be learned from this.

  • If you have already started working with Terraform, do not use anything else. If part of your infrastructure is managed using Terraform, you can no longer modify it manually. Otherwise, you not only run the risk of getting strange Terraform errors, but also negate many of the benefits of IaC, as the code will no longer be an accurate representation of your infrastructure.
  • - , import. Terraform , terraform import. Terraform , . import . . , : _. ( aws_iam_user.existing_user). — , . , ID aws_iam_user (, yevgeniy.brikman), ID aws_instance EC2 ( i-190e22e5). , , .

    import, aws_iam_user, Terraform IAM 2 (, yevgeniy.brikman ):

    $ terraform import aws_iam_user.existing_user yevgeniy.brikman

    Terraform API AWS, IAM aws_iam_user.existing_user Terraform. plan Terraform , IAM , .

    , , , Terraform, . , Terraforming (http://terraforming.dtan4.net/), AWS .


    — , , . , , . — , . , Terraform IaC, , « » , .

    , — . IDE . , , Terraform , .

    , webserver-cluster cluster_name:

    variable "cluster_name" {
       description = "The name to use for all the cluster resources"
       type          = string
    }

    , foo. bar. , - .

    , webserver-cluster cluster_name , name ALB:

    resource "aws_lb" "example" {
       name                    = var.cluster_name
       load_balancer_type = "application"
       subnets = data.aws_subnet_ids.default.ids
       security_groups      = [aws_security_group.alb.id]
    }

    name - , Terraform . ALB, -. , , , .

    , , Terraform. aws_security_group webserver-cluster:

    resource "aws_security_group" "instance" {
      # (...)
    }

    instance. , ( ) cluster_instance:

    resource "aws_security_group" "cluster_instance" {
       # (...)
    }

    ? : .

    Terraform ID . , iam_user IAM AWS, aws_instance — ID AWS EC2. (, instance cluster_instance, aws_security_group), Terraform , . , Terraform , .

    , .

    • plan. . , Terraform , , , .
    • , . , , . , create_before_destroy. , : apply, apply .
    • . , (, aws_security_group instance cluster_instance), , Terraform. — terraform state. terraform state mv, :

      terraform state mv <ORIGINAL_REFERENCE> <NEW_REFERENCE>

      ORIGINAL_REFERENCE — , , NEW_REFERENCE — , . , aws_security_group instance cluster_instance :

      $ terraform state mv \
         aws_security_group.instance \
         aws_security_group.cluster_instance

      Terraform, , aws_security_group.instance, aws_security_group.cluster_instance. terraform plan , , .

    • . . , Terraform . , , . plan create_before_destroy.


    API , AWS, . , , . , ; , , API-.

    , , API- AWS EC2. API «» (201 Created) , . , , AWS , , . , , , (404 Not Found). , EC2 AWS, , .

    API , . , AWS SDK , Terraform 6813 (https://github.com/hashicorp/terraform/issues/6813):

    $ terraform apply
    aws_subnet.private-persistence.2: InvalidSubnetID.NotFound:
    The subnet ID 'subnet-xxxxxxx' does not exist

    , (, ) - ( ID ), Terraform . ( 6813) , , Terraform . , . terraform apply , .

    «Terraform: ».

All Articles