The pain and suffering when debugging microservices in web development

In IT, you rarely see a person who has not heard of microservices. There are a lot of articles on the Internet and on specialized sites on this subject that generally explain the differences between the monolith and, in fact, microservices well. An inexperienced Java developer, having read articles from the category “What are microservices for web applications and what they eat with”, is full of joy and confidence that now everything will be wonderful. After all, the main goal is to “saw through” the monstrous monolith (the final artifact, which, as a rule, is a war / ear file), which performs a bunch of everything, on a number of separately living services, each of which will perform a strictly defined function related only to it, and will do it well. In addition to this comes horizontal scalability - just do scalingcorresponding nodes, and everything will be great. More users have arrived or more capacities are required - just added 5-10 new service instances. Roughly speaking, in general, this is how it works, but, as you know, the devil is in the details, and what initially seemed rather simple, upon closer examination, can turn into problems that no one took into account initially.

In this post, colleagues from Rexoft's Java practice share their experiences on how to debug microservices for the web.



How to achieve transactional data integrity


When trying to transfer architecture from a monolith to microservices, teams that did not have such experience before often start to split services into top-level objects of the domain model, for example: User / Client / Employee , etc. In the future, with a more detailed study, understanding appears. which is more convenient to break into larger blocks that aggregate several objects of the domain domain inside themselves. Due to this, you can avoid unnecessary calls to third-party services.

The second important point is support for transactional data integrity. In the monolith, this problem is solved through the Application Server, where war / ear is spinning, inside which the container, in fact, outlines the boundaries of transactions. In the case of microservices, the boundaries of transactions are blurred and there is a need, in addition to writing business logic code, to be able to manage data integrity, maintain their consistency between different parts of the system. This is a pretty non-trivial task. Recommendations for resolving this kind of architectural problem can be found on the Internet and in the relevant technical communities.

In this article we will try to describe specific technical difficulties that arise when teams try to work with microservices, and ways to solve them. I note right away that the proposed options are not the only true ones. Perhaps there are more elegant services, but the recommendations that I will give are tested in practice and accurately solve the existing difficulties, and whether or not to use them is a personal matter for everyone.

The main problem with microservices is that they are extremely easy to run locally (for example, using spring.io and intellij idea , this can be done in just 5 minutes, or even less). However, when trying to do the same in Kubernetesa cluster (if you had little experience with it before), a simple launch of a controller printing “Hello World” when accessing a specific endpoint can take half a day. In the case of a monolith, the situation is simpler. Each developer has a local Application Server. The deployment process is also quite simple - you need to copy the final war / ear artifact to the right place in the Application Server manually or using the IDE . Usually this is not a problem.

Debugging Nuances


The second important point is debugging . In situations with a monolith, it is assumed that the developer has an Application Server on his machine, on which his war / ear is deployed. You can always debug, because everything you need is at hand. With microservices, everything is a little more complicated, a service is usually a thing in itself. As a rule, he has his own database scheme, in which his data is located, performs specific functions specific to him, all communication with other services is organized via synchronous HTTP calls (e.g. via RestTemplate or Feign), asynchronous (e.g. Kafka or RabbitMQ). Therefore, the essentially simple task of saving or validating a certain object that was previously implemented in one place, inside a single war / ear file, in the general case with a microservice approach, becomes representable in the form: go to one or N adjacent services, be it data acquisition operations , for example, some reference values, or the operation of saving adjacent entities,whose data is needed to perform business logic in our service. Writing business logic in this case becomes much more difficult.

Accordingly, the solution options are as follows :

  1. Write your business logic code. At the same time, all external calls are mocked - external contracts are emulated, tests are written under the assumptions that external contracts are just like that, after that there is a deploy to the circuit for verification. Sometimes it’s lucky and integration works right away, sometimes it’s unlucky - you have to redo the business logic code an nth number of times, because during the time that we implemented the functionality, the code in the adjacent service was updated, the API signatures changed and we need to redo it part of the task is on its side.
  2. . , , Kubernetes, . . , — , remote debug . , runtime , , . -, , 2–5 , . . , Kubernetes , . -, (Per thread), , .

Kubernetes


A solution to this issue, in fact, is telepresence . There are probably other programs of this kind, but personal experience was only with him, and he established himself positively. In general, the principle of operation is as follows:

On the local machine, the developer installs telepresenc e, configures kubectl to access the corresponding Kubernetes cluster (adds the loop configuration to ~ / .kube / config ). After that, telepresence starts , which in fact acts as a proxy between the local developer computer and Kubernetes. There are different launch options, it is better to look in more detail in the official guide, but in the most basic case it comes down to two steps:

  1. Sudo telepresence (, Linux- , sudo . , root/). Kubernetes deployment telepresence . deployment Kubernetes.
  2. Starting your service instance is as usual on the local computer of the developer. However, in this case, he will have access to the entire infrastructure of the Kubernetes cluster, be it Service Discovery (Eureka, Consul), Api Gateway (Zuul), Kafka and its queues, if any, and so on. That is, in fact, we have access to all the cluster environment we need, but locally. The bonus is the possibility of local debugging, but in a cluster environment, and it will already be much faster, because we, in fact, are inside Kubernetes (through the tunnel), and do not access it from the outside via port for remote debug.

This solution has several disadvantages:

  1. Telepresence Linux Mac, Windows VFS, , issue GitHub. . , - Linux/Mac, .
  2. , Service Discovery (Eureka, Consul)Round Robin , endpoint , , , :

  • kubernetes -> . telepresence deployment , «» Eureka ip-address:port/service-name dns-name:port/service-name , . . Kubernetes , timeout;
  • deployment - Kubernetes , ( ) (Round Robin), ;
  • endpoint, feature, HTTP 404 endpoint Gateway, Service Discovery , Round Robin . Service Discovery endpoint , HTTP 404.
  • , , .


By dynamic routing of a request, we mean that the API Gateway (Zuul) has the ability to choose among several instances of the same service that we need. In the general case, this problem can be solved by adding a predicate that allows you to select the desired service from the common service pool with the same name at the request processing stage. Naturally, each service among those with which we want to be able to dynamically route, will have to have some kind of meta-information containing data that will be used to determine whether this service is needed or not. Spring Cloud (in the case of Eureka), for example, allows you to do this by specifying in a special metadata block in application.yml :

eureka:
  instance:
    preferIpAddress: true
    metadata-map:
      service.label: develop

After registering such a service in Service Discovery in its com.netflix.appinfo.InstanceInfo # getMetadata there will be a label with the key service.label and the value develop , which can be obtained in runtime. An important point at the stage of starting a service is checking whether a service instance exists in Service Discovery with such meta-information or not in order to avoid potential collisions.

Routing Options


After that, the solution of the problem can be reduced to two options:

  1. API Gateway . , , , , Headers: DestionationService: feature/PRJ-001. , , Header . , — - API Gateway.
  2. API Gateway, , . ., , , Zuul 1 endpoint- /api/users/… user, feature/PRJ-001, Zuul 2 endpoint- /api/users/… user, feature/PRJ-002. , N API Gateway N , . . , . . feature — , , , , , , . API Gateway, , . ., , , — , .






As part of the Gateway API, it is also worthwhile to provide a mechanism that allows you to be able to change routing rules in runtime. It’s best to place these settings in config-map . In this case, it will be enough to rewrite the new routes and either restart the Gateway API in Kubernetes to update the routing, or use the Spring Boot Actuator (provided that there is a corresponding dependency in the Gateway API) - call endpoint / refresh, which essentially re-reads data from config-map and will update routes.

An important point is also that there should be, relatively speaking, a reference instance of the service (for example, labeled develop, which will be collected from the main branch of the service development) and a separate main Gateway API, which will always be specified in the settings that it will access this service. In essence, we provide ourselves with an independent staging environment that will always be operational in the context of dynamic routing.

An example of a config-map block for the Gateway API that contains settings for routing (here it is just an example of how it might look, for proper operation it requires a corresponding binding in the form of code on the backend side of the API Gateway service ) :

{
  "kind": "ConfigMap",
  "apiVersion": "v1",
  "metadata": {
    ...
  },  
"data": {
    ...        
    "rules.meta.user": "develop",
    "rules.meta.client": "develop",
    "rules.meta.notification": "feature/PRJ-010",
    ...    
  }
}

rules.meta is a map containing routing rules for services.
user / client / notification - the name of the service under which it is registered in Eureka.

develop / feature / PRJ-010 - service label from application.yml of the corresponding service, based on which the desired service will be selected among all available services with the same name from Service Discovery , if there are more than one instance of such a service.

Conclusion


Like everything in this world, the tools and solutions in IT are not perfect. Do not think that if you change the architecture, all problems will end at once. Only a detailed immersion in the technologies used and your own experience will give you a real picture of what is happening.

I hope this material helps you solve your problem. Interesting tasks and sell without bugs!

All Articles