PuppetConf 2016. Kubernetes for system administrators. Part 1

I am a system administrator, dealing with computers, and today we will talk about Kubernetes. I will try to delve deeper into the topic, considering what problems the system administrator can solve with this application, and also touch upon some aspects of Puppet operation, which seemed to fit into this world with the help of a new set of abstractions for the application to work.
Five or six years ago, Luis Andre Barroso and Urs Hoesl in the article “Data Center as a Computer” suggested that we should perceive the data center as one massive computer. It is necessary to abstract from the fact that the data center consists of separate machines, and consider it as one logical entity. As soon as you try to use this idea in practice, you can apply the principles of building distributed systems and distributed computing to data centers.



In order to treat the data center as a computer, you need an operating system. It looks very similar to the one you use on a separate computer, but must have a different interface, because you do not need access to a separate machine and you do not need access to the kernel. So, let's think of the data center as a big computer. Today I will tell you what to do if you seem to be deprived of the ability to control any machine using SSH. You will not be able to log in, and although some people believe that without this it is impossible to control the system, I will tell you how much can be done with Kubernetes. First, you should think of Kubernetes as a framework for building distributed platforms.



This does not mean that after downloading Kubernetes, you will get a user interface that will provide you with everything you want to do with the system. No, this is just the foundation for creating the tools you need to run your infrastructure. I'll show you how to create integration using the Let's Encrypt Certificate Authority to automate the certificate process for my application using Kubernetes as a framework.



Many people ask exactly what Kubernetes is good for. I worked with Puppet Labs for many years and saw that this thing was installed on computers in order to provide the system with an API that has not existed so far. Instead of Bash, YAML scripts, and similar things, Puppet provided a DSL user who allowed him to interact with the machines programmatically without shell scripts. The difference between Kubernetes is that it is located above the level of "iron". Let's focus not so much on automation and abstraction of this system as on the relationships, or on the contract, between our infrastructure and applications that we are going to pull from any node. In Kubernetes, we don’t assign any applications to machines, there is no such thing as a “node manifest”, the scheduler considers individual nodes simply as data center resources,representing one big computer.

Questions: “Does Kubernetes run on OpenStack, on VMware, on Bare Metal, in the cloud?” don't make sense. The correct question is: “Can I run the Kubernetes agent to retrieve these resources?” And the answer will be “yes”. Because the operation of this application is completely independent of the platform you choose.



Kubernetes is platform independent. Kubernetes is declarative in the same way as Puppet. Therefore, you are reporting for which application you are going to use it - in this example it is nginx. This is a contract between you, the developer, and Kubernetes as a means to create container images.

I like to use the analogy with the express company FedEx - you cannot drag them a whole wagon of things and wait until they sort it all and send it to where it should be. They have a rule: pack your things in a box. Once you do, they will send your box and will be able to tell when it will arrive at its destination. “If you don’t have a box, then you will not be able to work with our system.”

Therefore, when working with containers, it is useless to discuss what you have - Python, Java and or something else, it does not matter - just take all your dependencies and put them in the container. Many people say about containers that they seem to solve the problems of the whole infrastructure. However, the problem is that people don’t perceive containers as two different things that they really are. The first thing is the idea of ​​a packaging format, the second is the idea of ​​a container runtime. These are two different things that do not necessarily require the same tools.

Which of those present created the containers? And who uses Puppet to build containers? That's right, I do not agree with this either! You can say: “how so? You are at the Puppet conference, you have to agree! ” The reason I disagree with the idea of ​​building containers in Puppet is this: I don’t know if we need it at all, because the things we need to create an image are different from the things we need to start the prod process.

Let's think of it as building a software pipeline and take a look at this Dockerfile. Close your eyes to those who have never seen these files, because for you they can look intimidating. This file shows how to create a Ruby on Rails application - a Ruby application with a Rails framework.



It says “FROM ruby ​​2.3.1”, which will probably insert the entire operating system into the container, in this case the basic alpine-linux Ruby image. Does anyone know why we embed Ubuntu or Red Hat images in these containers? Most people do not know what dependencies are, and use a haphazard approach, simply stuffing the entire OS into a container to ensure they have their dependencies somewhere inside. So, after you have built this thing, you need to run it only once. This is where the misunderstanding comes from: if this thing doesn't work, change the line of code until it works. Just check! You do not need to be too smart with these files, your goal is to build an offline representation of your application with all the dependencies. This is just a crutch that we use from Ubuntu as a starting point.

If you just created the application and launched it, you would use something like a static link. No dependencies on the host, in your container you would just have a binary of a single-line Docker file and no basic images. In fact, it’s just a transfer of things that are familiar to us. See how this assembly will look.



I previously created this file, but usually it looks like an attempt to build the entire Internet. I am a little afraid of this assembly, because I'm not sure that the local Internet can handle Ruby. See what happened.



How do you like this volume of 1 Gig? Moreover, the source file, that is, your application itself, can occupy, say, only 100 kB. So how did we get a whole gigabyte? This is because we use inefficient tools for building stand-alone applications. All of them are designed to run on computers and use dynamic libraries loaded from the external environment.

Now we will try to do what we do on a mobile phone - portable versions of applications for the use of which there is a contract between you and the infrastructure. As soon as we have such a contract, we will be able to tell the system exactly what it should do, and it will not care what to do.

You do not have any special, specially created application. I came across companies that say: “we have a special application!”. I say: “Let’s guess what it does: it starts, binds to the port, receives traffic, does something with the data,” and they are: “Wow, how do you know that?” I know, because there is nothing special here!

So, we take the sample that we placed in the container and send it to the API server. Next, we need to turn this into something that will work on our machine. However, in order to collect resources from the machine, we need to install several things: a container runtime for the Docker file, an agent who understands how to communicate with the wizard in order to run everything necessary, what the system response should be for the application to start working . This agent is simply observing - there are no 30 second intervals, repeated checks, nothing like that. He simply observes, saying: “if there is work for me, let me know about it, and I will start and will constantly inform you of the status of the process”, so that you will know that it works.



As soon as we do this by car, we need a scheduler. How many people use the scheduler? You all have to raise your hands! It’s the same as the answer to the question: “which of you has a laptop with more than a single-core processor?”. Indeed, when I ask a question about the planner, most do not raise their hands.

When you start your process on a machine, something should choose which processor to use. Who does this work? That's right, the core. Now I will explain to you guys what a planner is. The fastest way to do this is to play Tetris. The first thing we will discuss is automatic deployment.



How many of you have used fully automatic deployment? Clearly, I think that’s why we are all here. So, I press the button, the blocks start to fall from above, and now you can go for a beer. But notice what happens on the left and on the right: your processor and your memory turn into the trash.



This happens because most people use no more than 5% of computer resources. You automate processes, but lose a ton of money. I work as a cloud provider, I have huge reserves of resources, but it's just awful when people spend money in a similar way.

When you use the scheduler, by analogy with Tetris, control the playing field and control each block, directing it to the right place, that is, use the resources of the machine in the most optimal way. Kubernetes uses a couple of algorithms for this. The main algorithm is called Bin Packing - the same “Tetris” will help to understand it. Kubernetes receives a workload of various shapes and sizes, and our task is to pack it in machines in an optimal way.

Our goal is to reuse all the resources that become available as work progresses. Not all workloads are the same, so it’s hard to put them in the same box. But in Kubernetes, when a “piece” of the workload appears (or the Tetris block, if we continue our analogy), there is always the right cluster where you can put it and run it. And as with any batch processing, after the task is completed, we get back all the resources previously occupied so that we can use them for future tasks.

Since we live not in the game, but in the real world, you have solutions developed many years ago. Then you become a system administrator, employers show you their production, and you notice that their deployment is not good enough.



You can install cluster managers on parts of your machines and let them manage certain resources. In this particular case, you can use Kubernetes, which will begin to fill in the empty spaces of your Tetris as you move forward.
Let those who work in the industry raise their hands. Yes, this is a classic called salary. Suppose you have several problems in your enterprise. The first is that everything is written in Java or even in COBOL - usually no one is ready for this.

The second problem often encountered in enterprises is the Oracle DBMS. This is such a thing, which is located in the rear of the software and says: "Do not try to automate anything!". If you automate the software, your cost will increase. Therefore, no automation - we are promoting our consulting ecosystem!

Typically, in such circumstances, people ask if Kubernetes can simply be used to solve these problems. I answer: “no,” because in a situation similar to losing in Tetris, nothing will help you. You need to do something else, namely to use the scheduler.
Now that we have a scheduler that successfully understands workloads, you can simply put everything in boxes, and the scheduler will begin its work.



Let's talk about the key entities of Kubernetes. Firstly, these are Pods pods, which are a collection of containers. In most cases, an application consists of more than one component. You can attach yourself an application written without Java scripts, but you might want to use nginx to complete TLS and just a proxy in the application’s background, and then these things should be connected together because they are tightly connected dependencies. A loosely coupled dependency is a database that you independently scale.



The second important thing is the Replication Controller, which is a manager of the processes occurring in the Kubernetes cluster. It allows you to create multiple instances of hearths and monitor their condition.

When you say that you would like to start some process, it means that it will work all the time somewhere in the cluster.

The third important element is the Service, a set of collaborative hearths. Your deployment is based on the dynamic determination of the desired state - where the applications should work, with what IP address, etc., so you need some form of service.

The fourth element is the Volumes repositories, which can be viewed as directories available for containers in the hearth. Kubernetes has different types of Volumes that determine how this store is created and what it contains. The concept of Volume was also present in Docker, but the problem was that the storage there was limited to a specific hearth. As soon as under ceased to exist, Volume disappeared with it.

The storage that Kubernetes creates is not limited to any container. It supports any or all containers deployed inside the hearth. A key advantage of Kubernetes Volume is its support for various types of storage that Pod can use simultaneously.

Let's look at what a container is. This is the image format in which our application with all the dependencies is packed, and the main configuration of the runtime environment, showing how this application should work. These are two different elements, although you can pack anything you like into this thing, in particular, the Root Filesystem as a compressed Tarball tar file, which contains many configuration files for a specific system.



Then we can perform the distribution, a process that is familiar to you all — RPM or another repository system is used here. You take all these things and put them in the repository. This process is very similar to what we do with OS packages, only for containers that are created from images.



Pod allows you to compose everything that our logical application needs. A logical application is a tool for managing multiple applications within the same system profile. A sub is a resource package that includes one or more containers and storages, a common namespace, and one IP address per sub. Vaults can be distributed between containers.



In general, the hearth design resembles a virtual machine. It ensures that the application will start and stop as an atomic unit. The next slide shows what the replication controller looks like. If I send this declaration to the server and say: “Hey, I want one replica of the Foo application to work!”, The controller will create it from the template and send it to the scheduler, which will place the application on Node 1. We do not specify on which machine it should run, although we can do it. Now increase the number of replicas to 3.



What actions do you expect from the system if one of the machines fails? In this case, the replication controller will bring the current state of the system to the desired state, moving it under the container from the third, non-working machine, to the second.



You do not need to delve into this process and direct it - entrusting the work to the controller, you can be sure that the application will be ensured properly by constantly monitoring changes in the current state of the infrastructure and making decisions that ensure the system is working.

These things are ahead of their time - you just tell the system: “I want these three machines to work!”, And this is where your control ends. This approach is very different from scripting and pure automation, when you really need to manage what is happening right now to influence the future decision. You cannot codify all this without the ability to receive incoming information in order to respond to the situation properly. The approach described above provides you with this opportunity.
How do you imagine the configuration - the concept of configuration files for services? A lot of people are silent about this issue when it comes to containers, but we still need configuration, it does not disappear anywhere!

Kubernetes also uses the Secrets concept, which is used to store and transfer encrypted data between managers and Nods nodes.

We never run Puppet in a container because there is no reason to do this. You can use Puppet to generate a config file, but you still want to store it in Kubernetis because it allows you to distribute it using the runtime. Let's see how it looks.



In this example, we create a secret from a file and save it on the Kubernetes API server. You can imagine that you replaced this part with something like Puppet, which uses the eRB template and hidden data to fill in the contents of the secret - it does not matter who does it, but you can do it anyway.

Once the secret is in place, it can serve as a link to create a deployment that says, “I want to use this secret!” In this case, Kubernetes does the following.



It creates a pod, takes data from secret, puts it in a temporary file system and presents it as a container, just like Puppet creates a copy on a machine. This follows the life cycle of the application, and when the application dies, the configuration disappears with it. If you need 10,000 application instances, you will need to create and inject into under 10,000 temporary file systems.

Services allows you to run all of these elements in a cluster and synchronize them with another endpoint. In fact, a service is a group of hearths that function as a single hearth. It contains a permanent IP address and port, provides integration into DNS, uses load balancing and is updated when the firmware changes.



Now let's look at the collaboration of the conceptual components of Kubernetes. We have Lego bricks and configurations that must interact with each other. If you intend to run a database in a cluster, then you can really do it on Kubernetes. Many say that you will not be able to run stateful applications in containers, but this is completely wrong.

If you think about how hypervisors work, you can understand that they do almost the same thing as you: create a virtual machine, the scheduler moves it to the hypervisor and attaches the storage. Sometimes you work with local storage that comes from the hypervisor, and there are no reasons why containers could not do the same.

However, the problem with containers is that most people are not accustomed to having an explicit list of file paths that they must provide to the application. Most people will not be able to tell you exactly the devices and files that the data warehouse needs. They pack everything in containers, and as a result, nothing good comes of it. Therefore, do not believe that containers are not capable of running state-saving applications - you can very well do this.



On the slide you see an example of Ruby-on-Rails, and before we can use our application, we need to migrate the database. Let's get down to the live demo of the program. In order to carry out the deployment, I use MY_SQL, and you see a lot of data on the screen.



I am showing you all this because, as a system administrator, you have to understand many things. In this deployment, I refine some of the metadata of my application, but the main thing I will highlight in gray: I want to start 1 copy of the mysql application and use the mysql container version 5.6.32.



Notice that here I select some secret from Kubernetes as links, which in this case I am going to inject as environment variables. Later I will show you another case when we insert them into the file system. Thus, I do not need to “bake” secrets in my configuration. The next important line is the resource block.



You cannot play Tetris until you know the size of the blocks. Many people start deploying without using resource limits for this process. As a result, RAM is completely clogged, and you "stack" the entire server.

22:09 min

To be continued very soon ...


A bit of advertising :)


Thank you for staying with us. Do you like our articles? Want to see more interesting materials? Support us by placing an order or recommending to your friends, cloud VPS for developers from $ 4.99 , a unique analog of entry-level servers that was invented by us for you: The whole truth about VPS (KVM) E5-2697 v3 (6 Cores) 10GB DDR4 480GB SSD 1Gbps from $ 19 or how to divide the server? (options are available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4).

Dell R730xd 2 times cheaper at the Equinix Tier IV data center in Amsterdam? Only we have 2 x Intel TetraDeca-Core Xeon 2x E5-2697v3 2.6GHz 14C 64GB DDR4 4x960GB SSD 1Gbps 100 TV from $ 199 in the Netherlands!Dell R420 - 2x E5-2430 2.2Ghz 6C 128GB DDR3 2x960GB SSD 1Gbps 100TB - from $ 99! Read about How to Build Infrastructure Bldg. class c using Dell R730xd E5-2650 v4 servers costing 9,000 euros for a penny?

All Articles