Seven DevOps Transformation Archetypes

The question "how to implement a devopae" is not the first year, but there are not so many good materials. Sometimes you become a victim of advertising not very smart consultants who need to sell their time, no matter how. Sometimes these are murky, extremely general words about how ships of megacorporations plow the expanses of the universe. The question arises: what about us? Dear author, can you clearly formulate your ideas with a list?

All this comes from the fact that there is not much accumulated real practice and understanding of the outcome of the transformations of the company’s culture. Changes in culture are long-playing things, the results of which will appear not in a week and not in a month. We need someone ancient enough to see how companies were created and collapsed over the years.



John Willis- One of the fathers of DevOps. Behind John - dozens of years of work with a huge number of companies. Recently, John began to notice for himself specific patterns that take place in working with each of them. Using these archetypes, John guides companies on the true path of DevOps transformation. Read more about these archetypes in the translation of his report from the DevOops 2018 conference.



About the speaker:

Over 35 years in IT management, participated in the creation of the predecessor of OpenCloud in Canonical, took part in 10 startups, two of which were sold by Dell and Docker. He is currently Vice President of DevOps and Digital Practices at SJ Technologies.

Next is the narrative on behalf of John.

My name is John Willis and the easiest way to find me on Twitter is @botchagalupe . I have the same alias on Gmail and GitHub. And at this link you can find videos of my reports and presentations to them.

I have many meetings with CIOs of various large companies. They very often complain that they do not understand what DevOps is, and everyone who tries to explain it to them is talking about something else. Another common complaint is that DevOps does not work, although it seems that the directors are doing everything as explained to them. We are talking about large companies that are over a hundred years old. After talking with them, I came to the conclusion that for many problems not high technologies, but relatively low-tech solutions are best suited. For weeks, I just chatted with people from different departments. What you see in the very first picture in the post is my last project, the room looked like this after three days of work.

What is DevOps?


Indeed, if you ask 10 different people, they will give 10 different answers. But what’s interesting: all these ten answers will be correct. There is no wrong answer here. I studied DevOps quite deeply, for about 10 years, was the first American on the first DevOpsDay. I will not say that I am smarter than everyone involved in DevOps, but there is hardly anyone who has spent as much effort on this. I believe that DevOps arises when human capital and technology are combined. We often forget about the human dimension, although we talk a lot about all kinds of cultures.



Now we have a lot of data, five years of academic research, the verification of theories has been established on an industrial scale. These studies tell us the following: if you combine some behavioral patterns in an organizational culture, you can get an acceleration of 2000 times. This acceleration corresponds to the same improvement in stability. This is a quantitative measurement of the benefits that DevOps can bring to any company. A couple of years ago I talked about DevOps to a Fortune 5000 CEO. When I was preparing for the presentation, I was very nervous because I needed to set out my many years of experience in 5 minutes.

I ended up giving the following definition of DevOps: This is a set of practices and patterns that make it possible to turn human capital into highly productive organizational capital. An example is how Toyota has been working for the past 50 or 60 years.



(Hereinafter, such schemes are presented not as reference material, but as an illustration. Their contents will be different for each new company. Nevertheless, the picture can be separately viewed and enlarged by this link.)

One of the most successful such practices is value stream mapping. Several good books have been written about this, the author of the most successful of them is Karen Martin. But over the past year, I have come to the conclusion that even this approach is too high-tech. He certainly has many advantages, I used him a lot. But when the CEO asks you why his company cannot move on to new tracks, it's too early to talk about value stream mapping. There are many much more fundamental questions that need to be answered first.

It seems to me that the mistake of many of my colleagues is that they simply give the company a five-point guide, and then come back six months later and look at what happened. Even a good circuit like value stream mapping has, let's say, blind spots. After hundreds of interviews with directors of various companies, I worked out a certain pattern that allows us to decompose the problem into components, and now we will discuss each of these components in order. Before applying any technological solutions, I use this pattern, and as a result I have all the walls hung with patterns. I recently worked with one mutual fund, and I ended up with 100-150 of these schemes.

Bad culture eats good breakfast approaches


The main idea is this: no Lean, Agile, SAFE and DevOps will help if the organization culture is bad. It’s the same as diving to the depths without scuba gear or operating without an X-ray. In other words, to paraphrase Drucker and Deming: a bad organizational culture will swallow any good system and not choke.

To solve this main problem, you must take the following steps:

  1. Make All Work Visible: Make all work visible. Not in the sense that it must be displayed on any screen, but in the sense that it must be observable.
  2. Consolidate Work Management Systems: . «» 9 10 . «Phoenix Project» - , , - . «» . c .
  3. Theory of Constraints Methodology: .
  4. Collaboration hacks: .
  5. Toyota Kata (Coaching Kata): Toyota Kata . , .
  6. Market Oriented Organization: .
  7. Shift-left auditors: .




I start work with the organization very simply: I go to the company and talk with employees. As you can see, no high technology. All that is needed is to write on. I collect several teams in one room and analyze what they tell me from the point of view of my 7 archetypes. And then I give them the marker myself and ask them to write on the board in writing all that so far they have said out loud. Usually at such meetings there is one person who writes everything down, and in the best case, he can record 10% of the discussion. With my method, this figure can be raised to about 40%.



(For a separate illustration, see the link )

My approach is based on the work of William Schneider, The Reengineering Alternative) The approach is based on the idea that any organization can be decomposed into four squares. This scheme for me is usually the result of working with those hundreds of other schemes that arise when analyzing an organization. Suppose we have an organization with a high level of control, but with low competence. This is an extremely undesirable option: when everyone walks along the line, but no one knows what to do.

A slightly better option with a high level of control and competence. If such a company has a profit, then perhaps it does not need DevOps. It is most interesting to work with a company that has a high level of control, low competence and cooperation, but at the same time a high level of culture (cultivation). This means that the company has many people who like to work there, the labor turnover is low.



(You can see this illustration separately from the link )

It seems to me that methods with hard-coded recommendations ultimately interfere with the achievement of truth. In particular, value stream mapping has many rules regarding how to structure information. In the early stages of the work that I am talking about now, nobody needs these rules. If a person with a marker in his hands describes the real situation in the company on the board, this is the best way to understand the situation. Such information does not reach the directors. At this moment, it’s stupid to cut off a person and say that he drew a wrong arrow. At this stage, it is better to use simple rules, for example: a multi-level abstraction can be created simply using multi-colored markers.

I repeat, no high technology. A black marker depicts objective reality, how everything works. People mark with a red marker what exactly they do not like in the current state of things. It is important that they write it, not me. When I go to the Director of Information Technology after the meeting, I do not propose a list of 10 things that need to be fixed. I strive to find a connection between what people in the company say and existing, proven patterns. Finally, a blue marker suggests possible solutions to the problem.



(Separately, this illustration can be viewed at the link )

An example of this approach is now depicted above. At the beginning of this year I worked with one bank. Workers from the security department there were convinced that they should not come to check the requirements and design (design and requirement reviews).



(You can see this illustration separately from the link )

And then we talked to people from other departments and it turned out that around 8 years ago, software developers put security workers out because they slowed down. And then it turned into a ban, which was taken for granted. Although in fact there was no ban.

Our meeting was an extremely confusing move: for about three hours, five different teams could not explain to me what was going on between the code and the assembly. And this, it would seem, is the simplest thing. Most DevOps consultants assume in advance that everyone already knows this.

Then the person in charge of IT governance, who was silent for four hours, suddenly came to life when we got to his subject and occupied us for a very long time. In the end, I asked him what he thinks about the meeting, and I will never forget his answer. He said: "I used to think that there were only two ways of delivering software in our bank, and now I know that there are five of them, and I did not even suspect about three."



(Separately, this illustration can be viewed at the link )

The last meeting at this bank was with the investment software team. It was with her that it turned out that writing marker circuits with a marker on a sheet is better than writing on a board, and even better than writing on a smartboard.



The photographs you see are what the hotel’s conference room looked like on the fourth day of our meeting. And we used these patterns to search for patterns, that is, archetypes.

So, I ask questions to employees, they write down answers with markers of three colors (black, red and blue). I analyze their answers for archetypes. Now let's discuss all the archetypes in order.

1. Make All Work Visible: Make the work visible.


Most companies I work with have a very high percentage of unknown jobs. For example, this is when one employee comes to another and just asks to do something. In large organizations, there may be 60% of unplanned work. And up to 40% of the work is not documented in any way. If it was a Boeing, then in my life I would never have boarded their plane again. If only half of the work is documented, then it is not known whether this work is done correctly or not. All other methods are useless - there is no point in trying to automate something, because the well-known 50% can be just the most coherent and clear-cut part of the work, the automation of which will not give great results, and all the most terrible - in the invisible half. In the absence of documentation it is impossible to find all kinds of hacks and hidden work, not to find bottlenecks,those same "Brents" about which I have already spoken. There is a beautiful book by Dominica De Grandis (Dominica DeGrandis)"Making Work Visible . " It identifies five different “ thieves of time”:

  • Too Much Work in Process (WIP)
  • Unknown Dependencies
  • Unplanned work
  • Conflicting priorities
  • Neglected Work


This is a very valuable analysis, and the book is wonderful, but all of these tips are useless if only 50% of the data is visible. You can apply the methods proposed by Dominica if the accuracy is achieved above 90%. I am talking about situations where the boss gives the subordinate a 15-minute task, and it takes him three days; but the boss doesn’t really know that this subordinate depends on four or five other people.



The Phoenix Project is a great story about a project that is three years late. One of the heroes is threatened with dismissal because of this, and he meets with another character who is presented as a kind of Socrates. He helps to figure out what exactly went wrong. It turns out that the company has one system administrator, whose name is Brent, and all the work somehow passes through it. At one of the meetings, one of the subordinates is asked: why does each half-hour task take a week? In response, a very simplified exposition of the queuing theory and Little's law follows, and in this exposition it turns out that at 90% employment each hour of work takes 9 hours. Each task needs to be sent to seven other people, so this hour turns into 63 hours, 7 times 9. I say this,To use Little's law or any complicated queuing theory, you need to at least have data.

Therefore, when I talk about visibility, I do not mean that everything was on the screen, but that it is necessary to at least have data. When they are, it often turns out that there is a very large amount of unplanned work, which for some reason is sent to Brent, although this is not necessary. And Brent is a great guy, he will never say no, but he doesn’t tell anyone how he does his job.



When the work is visible, you can accurately classify the data (that’s what Dominika does in the photo), you can apply the abstraction of five time leaks and automate it.

2. Consolidate Work Management Systems: Task Management


The archetypes I'm talking about are a kind of pyramid. If the first is done correctly, then the second is already a kind of add-on. Many of them do not work for startups, they must be borne in mind in the case of large companies, such as those that are on the Fortune 5000 list. The last company where I worked had 10 ticketing systems. Remedy was on one team, another wrote some kind of system of its own, a third used Jira, and someone else got by with email. The same problem arises if the company has 30 different pipelines, but I don’t have time to discuss all such cases.

I discuss with people exactly how tickets are created, what happens to them next, how they are circumvented. The most interesting thing is that people at our meetings speak quite sincerely. I asked how many people set up “minor / no impact” for tickets that really should have been assigned “major impact”. It turned out that almost everyone does it. I do not engage in denunciations and in every way I try not to identify people. When they sincerely admit to me in something, I do not betray a person. But when almost everyone bypasses the system, this means that all security, in essence, is a decoration. Therefore, no conclusions can be drawn from the data of this system.

To solve the problem with tickets, you need to select one main system. If you use Jira, let it be only Jira. If there is any alternative, let it be only that. The bottom line is that tickets need to be considered as another step in the development process. Any action must have a ticket that must go through the development workflow. Tickets are sent to the team, which puts them on the storyboard, and then is responsible for them.

This applies to all departments, including infrastructure and operations. In this case, you can make up at least some plausible idea of ​​the state of affairs. When this process is established, it suddenly turns out that you can easily establish who is responsible for each application. Because now we get not 50%, but 98% of new services. If this basic process works, accuracy improves throughout the system.

Pipeline Services


This again applies only to large corporations. If you are a new company in a new field, roll up your sleeves and work with your Travis CI or CircleCI. As for the Fortune 5000 companies, the case that happened with the bank where I worked was indicative. They came to them from Google, and they were shown diagrams with old IBM systems. The guys from Google with a misunderstanding asked - where is the source code for this? And there is no source code, not even a GUI. This is the reality that large organizations have to work with: 40-year-old banking records on an ancient mainframe. One of my clients uses Kubernetes containers with Circuit Breaker patterns, plus Chaos Monkey, all for KeyBank. But these containers eventually connect to the COBOL application.

The guys from Google were fully confident that they would solve all the problems of my client, and then began to ask questions: what is the IBM datapipe? They are answered: this is a connector. What is it connecting to? To the Sperry system. And what's that? Etc. At first glance it seems: what kind of DevOps can it be? But in fact, it is possible. There are delivery systems that allow you to transfer the workflow to delivery teams.

3. Theory of Constraints: Theory of Constraints


Let's move on to the third archetype: institutional / "tribal" knowledge. As a rule, in any organization there are several people who know everything and manage everything. These are those who work the longest in the organization and who know all the workarounds.



When this is revealed on the diagram, I specially draw a marker around such people: for example, it turns out that a certain Lou is present at all meetings. And for me it’s clear: this is the local Brent. When the CIO chooses between me in a T-shirt and sneakers and a guy dressed in a suit from IBM, I am chosen because I can tell the director about things that the other guy will not tell about and which the director may not like to hear about. I tell them that there is a bottleneck in their company, it is someone named Fred and someone named Lu. This bottleneck needs to be untied, their knowledge needs to be obtained one way or another from them.

To solve this kind of problem, I can, for example, suggest using Slack. A smart director asks why? Typically, in such cases, DevOps consultants respond: because everyone does. If the director is really smart, he will say: so what. And this is where the dialogue ends. And I answer this: because the company has four bottlenecks, Fred, Lou, Susie and Jane. To make their knowledge institutionalized, you must first introduce Slack. All your wikis are complete nonsense because no one knows about their existence. If the team of engineers is engaged in external and internal development and everyone should know that you can contact the external development team or the infrastructure team with questions. Just then, probably Lou or Fred will have time to connect to the wiki. And then at Slack, someone might askwhy it doesn’t work, say, step 5. And then Lou or Fred will correct the instructions on the wiki. If you establish this process, then a lot will fall into place.

This is my main idea: to recommend some high technologies, you must first put in order the foundation for them, and you can do this with the just described low-tech solutions. If you start with high technology and do not explain why they are needed, then, as a rule, this does not end with anything good. One of our customers uses Azure ML, a very cheap and easy solution. Somewhere, 30% of the questions were answered by the self-learning machine itself. And operators who did not deal with data science, statistics or mathematics wrote this thing. This is indicative. The cost of such a solution is minimal.

4. Collaboration hacks: Collaboration hacks


The fourth archetype is the need to fight isolation. Most people already know about this: isolation breeds enmity. If each department is on its own floor, and people do not intersect with each other in any way, except in the elevator, then hostility between them arises very easily. And if, on the contrary, people are in the same room with each other, she immediately leaves. When someone throws some kind of general accusation, for example, such and such an interface never works - there is nothing easier to deconstruct such an accusation. It is enough for the programmers who wrote the interface to start asking specific questions, and it soon becomes clear that, for example, the user simply used the tool incorrectly.

There are many ways to overcome isolation. I was once asked to advise a bank in Australia, I refused to do this because I have two children and a wife. All I could help them with was I recommend graphical storytelling to them. This is a thing that provably works. Another interesting way is lean coffee format meetings. In a large organization, this is a great way to disseminate knowledge. In addition, you can hold internal devopsdays, hackathons and so on.

5. Coaching Kata


As I already warned at the very beginning, today I will not talk about it. If interested, you can see some of my presentations .

There is also a good report on this topic from Mike Rother:



6. Market Oriented: A Market Oriented Organization


There are various problems here. For example, people “I”, people “T” and people “E”. People "I" are those who are engaged in only one thing. Usually they exist in organizations with isolated units. “T” is if a person knows one thing well, but also excels in some other things. An “E” or even a “comb” is when a person has a lot of skills. Conway's law



works here , which in the most simplified form can be stated as follows: if the three teams are involved in the compiler, the result will be a three-part compiler. Therefore, if the organization has a high level of isolation, then even Kubernetes, Circuit breaker, API extensibility and other fashionable things in this organization will be organized in the same way as the organization itself. Strictly according to Conway and in spite of you all, young geeks.

The solution to this problem has been described many times. There are, for example, organizational archetypes described by Fernando Fernandez. The problem architecture that I just talked about with isolation is a function-oriented architecture. The second type is the worst, matrix architecture, there is a mess of the other two. The third is what is seen in most startups, and large companies are also trying to match this type. It is a market oriented organization. Here is optimization to achieve the fastest response to customer requests. This is sometimes called flat organization.

Many people describe this structure in different ways, I like the wording of build / run teams , in Amazon they call it two pizza teams. In this structure, all people of type “I” are grouped around one service, and gradually they become closer to type “T”, and if the right management is established, even “E” can become. The first counterargument here is that there are superfluous elements in such a structure. Why do we need a tester in each department, if you can have a special department of testers? To which I answer: excess costs in this case are the price to ensure that in the future the whole organization becomes type “E”. In this structure, the tester gradually learns about networks, architecture, design, etc. As a result, each member of the organization is fully aware of everything that happens in the organization. If you want to know how this circuit works in industry, check out Mike Rother, Toyota Kata .

7. Shift-left auditors: audit in the early stages of a cycle. Compliance with safety regulations


This is when your actions do not pass, so to speak, a smell test. The people who work for you are not stupid. If they, as in the example above, everywhere exhibited minor / no impact, this lasted three years, and no one noticed anything, then everyone knows that the system is not working. Or another example is the change advisory board, where every, say, environment needs to be reported. A group of people work there (by the way, not too well paid), who in theory should know how the system as a whole works. And over the past five years, you probably noticed that our systems are insanely complex. And five or six people must decide on a change that they did not make and that they don’t know anything about.

Of course, this approach does not work. I have to get rid of such things, because these people do not protect the system. The decision must be made by the team itself, because the team must be responsible for it. Otherwise, a paradoxical situation arises when the manager, who has never written code in his life, tells the programmer how long it takes to write the code. In one company I worked with, there were 7 different tips that looked at each change, including an architecture, product, and so on. There was even a mandatory waiting period, although one employee told me that in ten years of work, no one in this mandatory period had ever rejected the changes made by this person.

Auditors should be called to themselves, and not get rid of them. Tell them that you are writing immutable binary containers that, if all the tests pass, remain unchanged forever. Tell them you have pipeline as code and explain what that means. Show them the following diagram: immutable read-only binary in a container that passes all vulnerability tests; and then, not only does nobody touch it - they don’t even touch the system that creates the pipeline, because it is also created dynamically. I have clients, Capital One, who use Vault to create something like a blockchain. You can not show the “recipes” from Chef to the auditor, just show the blockchain, from which it is clear what happened to the Jira ticket in production and who is responsible for it.



According to the reportcreated by Sonatype in 2018, there were 87 billion OSS download requests in 2017.



Vulnerabilities incurred are prohibitive. Moreover, the figures that you see above do not include opportunity costs. In a nutshell about what is DevSecOps. I want to say right away that I'm not interested in talking about how successful this name is. The point is that since DevOps were very successful, you need to try to add security to this pipeline.

An example of such a sequence:


This is not a recommendation for certain products, even though I like them all. I cited them as an example to show that DevOps, based initially on the organization paradigm in industry, allows you to automate every stage of work on a product.



And there is no reason why we could not take the same approach to security.

Total


As a conclusion, I will give some tips for DevSecOps. You need to include auditors in the process of creating your systems, spend time on their education. We need to work with auditors. Further, it is necessary to wage an absolutely ruthless fight against false positives. Even with the most expensive vulnerability scanning tool, you can end up creating extremely bad habits for your developers if you don’t know what signal-to-noise ratio is. Developers will be overloaded with events, and they will simply delete them. If you heard about the story with Equifax, then that’s what happened there, the signal of the highest danger level was ignored there. In addition, vulnerabilities need to be explained so that it is clear how they affect the business. For example, we can say that this is the same vulnerability as in the Equifax story. Security VulnerabilitiesYou need to consider the same way as other problems with the software, that is, they need to be included in the overall DevOps process. You need to work with them through Jira, Kanban, etc. Developers should not think that someone else will do this; on the contrary, everyone should do it. Finally, you need to spend energy on educating people.

useful links


Here are some talks from the DevOops conference that you might find helpful:



Take a look at the DevOops 2020 Moscow program - there are also a lot of interesting things there.

All Articles