The book "Terraform: infrastructure at the code level"

imageHello, habrozhiteli! The book is intended for everyone who is responsible for the code already written. This applies to system administrators, operations specialists, release, SR, DevOps engineers, infrastructure developers, full-cycle developers, engineering team leaders and technical directors. Whatever your position, if you are involved in infrastructure, deploy code, configure servers, scale clusters, back up data, monitor applications and answer calls at three in the morning, this book is for you.

Together, these responsibilities are commonly referred to as operational activities (or system administration). Previously, developers who knew how to write code but did not understand system administration were often met; system administrators quite often came across without the ability to write code. Once this separation was acceptable, but in the modern world, which can no longer be imagined without cloud computing and the DevOps movement, almost any developer needs administrative skills, and any system administrator must be able to program.

You will not only learn how to manage the infrastructure in the form of code using Terraform, but also learn how it fits into the overall concept of DevOps. Here are some questions you can answer by reading this book.

  • Why use IaC at all?
  • , , ?
  • Terraform, Chef, Ansible, Puppet, Salt, CloudFormation, Docker, Packer Kubernetes?
  • Terraform ?
  • Terraform, ?
  • Terraform, ?
  • Terraform?
  • Terraform ?
  • Terraform ?

2017 . 2019- , ! . , .

, , , Terraform, .

  • Terraform. , Terraform 0.8. Terraform 0.12. , . , !
  • . Terraform. , , , , , , , .
  • . Terraform . , , — , .
  • . 8 , Terraform . , , , : , .
  • HCL2. Terraform 0.12 HCL HCL2. ( ${…}!), , , null, for_each for, . HCL2, 5 6.
  • . Terraform 0.9 . Terraform . Terraform 0.9 , ; 0.10 . 3.
  • Terraform. Terraform 0.10 ( AWS, GCP, Azure . .). , . terraform init , . 2 7.
  • . 2016 Terraform (AWS, GCP Azure). 100, , , 1. (, Alicloud, Oracle Cloud Infrastructure, VMware vSphere .), , (GitHub, GitLab BitBucket), (MySQL, PostreSQL InfluxDB), ( DataDog, New Relic Grafana), Kubernetes, Helm, Heroku, Rundeck Rightscale . , : , AWS , , CloudFormation!
  • Terraform. 2017 HashiCorp Terraform (registry.terraform.io) — , Terraform, . 2018 . Terraform 0.11 . « » . 153.
  • . Terraform 0.9 : , , errored.tfstate. Terraform 0.12 . , , .
  • . , (. « » . 144), « » (, « Terraform» . 242), plan apply (. « » . 64), create_before_destroy, count, (. «» . 160), , provider .

. Terraform


(much more attention is paid to the issues of automatic testing in the book later)

The DevOps world is full of different fears: everyone is afraid to allow things to work, lose data, or be hacked. When making any change, you always ask yourself what consequences it will have. Will it behave the same in all environments? Will it cause another outage? And, if that happens, how much will you have to stay at work this time to fix it? As the company grows, more is at stake, making the deployment process even worse and increasing the risk of errors. Many companies try to minimize this risk through less frequent deployments, but as a result, each individual deployment becomes larger and more prone to errors.

If you manage your infrastructure in the form of code, you have a better way to minimize risks: tests. Their goal is to give you enough confidence to make changes. The key word here is confidence: no tests can guarantee no errors, so you are more likely to deal with probability. If you can capture all of your infrastructure and deployment processes as code, you can test this code in a test environment. If successful, there is a big chance that the same code will work in an industrial environment. In a world of fear and uncertainty, high probability and confidence are expensive.

In this chapter, we will go through the process of testing infrastructure code, both manual and automatic, with an emphasis on the latter.

Manual tests:

  • basics of manual testing;
  • cleaning up resources after tests.

Automated tests:

  • unit tests;
  • integration tests;
  • end-to-end tests;
  • other testing approaches.

Manual tests


When thinking about how to test Terraform code, it’s useful to draw some parallels with testing code written in general-purpose programming languages ​​such as Ruby. Imagine you are writing a simple Ruby web server in the web-server.rb file:

class WebServer < WEBrick::HTTPServlet::AbstractServlet
  def do_GET(request, response)
     case request.path
     when "/"
         response.status = 200
         response['Content-Type'] = 'text/plain'
         response.body = 'Hello, World'
     else
         response.status = 404
         response['Content-Type'] = 'text/plain'
         response.body = 'Not Found'
     end
  end
end

This code will return a 200 OK response with the body Hello, World for the URL /; for any other address the answer will be 404. How would you test this code manually? Typically, some more code is added to run the web server locally:

#   ,      
#  ,       
if __FILE__ == $0
  #      8000
  server = WEBrick::HTTPServer.new :Port => 8000
  server.mount '/', WebServer

  #    Ctrl+C
  trap 'INT' do server.shutdown end

  #  
  server.start
end

If you run this file in the terminal, it will load the web server on port 8000:

$ ruby web-server.rb
[2019-05-25 14:11:52] INFO WEBrick 1.3.1
[2019-05-25 14:11:52] INFO ruby 2.3.7 (2018-03-28) [universal.x86_64-darwin17]
[2019-05-25 14:11:52] INFO WEBrick::HTTPServer#start: pid=19767 port=8000

To check the operation of this server, you can use the browser or curl:

$ curl localhost:8000/
Hello, World

$ curl localhost:8000/invalid-path
Not Found

Now imagine that we changed this code by adding an / api entry point to it that returns 201 Created and a body in JSON format:

class WebServer < WEBrick::HTTPServlet::AbstractServlet
  def do_GET(request, response)
     case request.path
     when "/"
         response.status = 200
         response['Content-Type'] = 'text/plain'
         response.body = 'Hello, World'
     when "/api"
         response.status = 201
         response['Content-Type'] = 'application/json'
         response.body = '{"foo":"bar"}'
     else
         response.status = 404
         response['Content-Type'] = 'text/plain'
         response.body = 'Not Found'
     end
  end
end

To manually test this updated code, press Ctrl + C and restart the web server by running the script again:

$ ruby web-server.rb
[2019-05-25 14:11:52] INFO WEBrick 1.3.1
[2019-05-25 14:11:52] INFO ruby 2.3.7 (2018-03-28) [universal.x86_64-darwin17]
[2019-05-25 14:11:52] INFO WEBrick::HTTPServer#start: pid=19767 port=8000
^C
[2019-05-25 14:15:54] INFO going to shutdown ...
[2019-05-25 14:15:54] INFO WEBrick::HTTPServer#start done.

$ ruby web-server.rb
[2019-05-25 14:11:52] INFO WEBrick 1.3.1
[2019-05-25 14:11:52] INFO ruby 2.3.7 (2018-03-28) [universal.x86_64-darwin17]
[2019-05-25 14:11:52] INFO WEBrick::HTTPServer#start: pid=19767 port=8000

To check the new version, you can again use the curl command:

$ curl localhost:8000/api
{"foo":"bar"}

Manual Testing Basics


What will this kind of manual testing look like in Terraform? For example, from the previous chapters, you still have the code for deploying ALB. Here is a snippet of the modules / networking / alb / main.tf file:

resource "aws_lb" "example" {
   name                     = var.alb_name
   load_balancer_type = "application"
   subnets                  = var.subnet_ids
   security_groups      = [aws_security_group.alb.id]
}

resource "aws_lb_listener" "http" {
   load_balancer_arn = aws_lb.example.arn
   port                      = local.http_port
   protocol                = "HTTP"

   #      404
   default_action {
      type = "fixed-response"

      fixed_response {
        content_type = "text/plain"
        message_body = "404: page not found"
        status_code = 404
      }
    }
}

resource "aws_security_group" "alb" {
   name = var.alb_name
}

# (...)

If you compare this listing with Ruby code, you can see one pretty obvious difference: AWS ALB, target groups, listeners, security groups, and any other resources cannot be deployed on your own computer.

The key conclusion about testing No. 1 follows from this: Terraform code testing cannot take place locally.

This applies not only to Terraform, but also to most IaC tools. The only practical way to do manual testing in Terraform is to deploy the code in a real environment (i.e. in AWS). In other words, independently launching the terraform apply and terraform destroy commands that you worked on while reading the book is manual testing in Terraform.

This is one of the reasons why it is so important to have easy-to-deploy examples in the examples folder of each module (see chapter 6). To test the alb module, the easiest way is to use the demo code that you created in examples / alb:

provider "aws" {
   region = "us-east-2"

   #     AWS  2.x
   version = "~> 2.0"
}

module "alb" {
    source = "../../modules/networking/alb"

    alb_name = "terraform-up-and-running"
    subnet_ids = data.aws_subnet_ids.default.ids
}

To deploy this example, you need to run the terraform apply command, as you have done repeatedly:

$ terraform apply

(...)

Apply complete! Resources: 5 added, 0 changed, 0 destroyed.

Outputs:

alb_dns_name = hello-world-stage-477699288.us-east-2.elb.amazonaws.com

At the end of the deployment, you can use a tool such as curl to, for example, make sure that ALB returns 404 by default:

$ curl \
   -s \
   -o /dev/null \
   -w "%{http_code}" \
hello-world-stage-477699288.us-east-2.elb.amazonaws.com

404

Infrastructure check

, HTTP, , , curl HTTP-. . , MySQL, MySQL. VPN-, VPN. , , SSH - . . , , . , .

Let me remind you: ALB returns 404 due to the absence of other listener rules in the configuration, and the default action in the alb module has a response of 404:

resource "aws_lb_listener" "http" {
   load_balancer_arn = aws_lb.example.arn
   port                      = local.http_port
   protocol                = "HTTP"

   #      404
   default_action {
      type = "fixed-response"

      fixed_response {
       content_type = "text/plain"
       message_body = "404: page not found"
       status_code = 404
      }
   }
}

So, you already know how to run and test your code. Now you can start making changes. Every time you change something (so that, for example, the default action returns 401), you need to use the terraform apply command to deploy the new code:

$ terraform apply

(...)

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

Outputs:

alb_dns_name = hello-world-stage-477699288.us-east-2.elb.amazonaws.com

To check the new version, you can restart curl:

$ curl \
   -s \
   -o /dev/null \
   -w "%{http_code}" \
   hello-world-stage-477699288.us-east-2.elb.amazonaws.com
401

When done, run the terraform destroy command to remove the resources:

$ terraform destroy

(...)

Apply complete! Resources: 0 added, 0 changed, 5 destroyed.

In other words, when working with Terraform, every developer needs good code samples for testing and a real development environment (like an AWS account), which serves as the equivalent of a local computer and is used to run tests. In the manual testing process, you will most likely have to create and remove a large number of infrastructure components, and this can lead to many errors. In this regard, the environment should be completely isolated from more stable environments intended for final testing and in particular for industrial applications.

Given the above, I strongly recommend that each development team prepare an isolated environment in which you can create and remove any infrastructure without consequences. To minimize the likelihood of conflicts between different developers (imagine that two developers are trying to create a load balancer with the same name), the ideal solution would be to give each team member a separate, completely isolated environment. For example, if you use Terraform in conjunction with AWS, each developer should ideally have their own account where they can test everything they want.

Resource cleanup after tests


The presence of many isolated environments is necessary for high productivity of developers, but if you are not careful, you can accumulate a lot of extra resources that will clutter up all your environments and cost you a round sum.
To keep costs under control, regularly clean your isolated media. This is the key conclusion about testing number 2 .

At a minimum, you should create such a culture in the team when, after testing, the developers delete everything that they deployed using the terraform destroy command. It may be possible to find tools for cleaning up excess or old resources that can be run on a regular basis (say, using cron). Here are some examples for different deployment environments.

  • cloud-nuke (http://bit.ly/2OIgM9r). , . AWS ( Amazon EC2 Instances, ASG, ELB . .). (Google Cloud, Azure) . — , . , cloud-nuke cron, . , , , :

    $ cloud-nuke aws --older-than 48h
  • Janitor Monkey (http://bit.ly/2M4GoLB). , AWS , ( — ). , , , . Netflix Simian Army, Chaos Monkey . , Simian Army , : , Janitor Monkey Swabbie (http://bit.ly/2OLrOLb).
  • aws-nuke (http://bit.ly/2ZB8lOe). This is an open source tool to delete all contents of an AWS account. The accounts and resources to be deleted are specified in the configuration file in the YAML format:

    #   
    regions:
    - us-east-2
    
    #    
    accounts:
       "111111111111": {}
    
    #    
    resource-types:
       targets:
       - S3Object
       - S3Bucket
       - IAMRole

    Aws-nuke starts as follows:

    $ aws-nuke -5c config.yml

Automated Tests


The concept of automatic testing is that tests are written to test the behavior of real code. In Chapter 8, you will learn that using the CI server, these tests can be run after each individual commit. If they are not completed, the fixation can immediately be undone or corrected. Thus, your code will always be operational.

There are three types of automated tests.

  • . — . , . (, , - ) mock-. (, mock- , ) , .
  • . . , . mock-: , , , , , , , mock-.
  • . (, , , ) . : , Selenium . - mock-, ( , ).

Each type of test has its own purpose, and with their help you can identify all kinds of errors, so they should be used together. Unit tests are fast and allow you to immediately get an idea of ​​the changes made and check a variety of combinations. This gives you confidence that the elementary components of your code (individual modules) behave as you expected. However, the fact that the modules work correctly separately does not mean at all that they can work together, therefore, to make sure that your elementary components are compatible, integration tests are needed. On the other hand, the correct behavior of different parts of the system does not guarantee that this system will work as it should after deployment in an industrial environment, so through tests are needed to test your code in conditions close to real ones.

»More information about the book can be found on the publisher’s website
» Contents
» Excerpt

For Khabrozhiteley 25% discount on coupon - Terraform

Upon payment of the paper version of the book, an electronic book is sent by e-mail.

All Articles