The book "Java Concurrency in Practice"

imageHello, habrozhiteli! Streams are a fundamental part of the Java platform. Multi-core processors are commonplace, and the effective use of concurrency has become necessary to create any high-performance application. An improved Java virtual machine, support for high-performance classes and a rich set of building blocks for parallelization tasks were at one time a breakthrough in the development of parallel applications. In Java Concurrency in Practice, the creators of breakthrough technology themselves explain not only how they work, but also talk about design patterns. It’s easy to create a competitive program that seems to work. However, the development, testing, and debugging of multi-threaded programs pose many problems. The code stops working just when it is most important: under heavy load.In “Java Concurrency in Practice” you will find both theory and specific methods for creating reliable, scalable and supported parallel applications. The authors do not offer a list of APIs and parallelism mechanisms; they introduce design rules, patterns and models that are independent of the Java version and remain relevant and effective for many years.

Excerpt. Thread safety


You may be surprised that competitive programming is associated with threads or locks (1) no more than civil engineering is associated with rivets and I-beams. Of course, the construction of bridges requires the correct use of a large number of rivets and I-beams, and the same applies to the construction of competitive programs, which requires the correct use of threads and locks. But these are just mechanisms - means of achieving the goal. Writing thread-safe code is, in essence, controlling access to a state, and, in particular, to a mutable state.

In general, the state of an object is its data stored in state variables, such as instance and static fields or fields from other dependent objects. The state of the HashMap hash is partially stored in the HashMap itself, but also in many Map.Entry objects. The state of an object includes any data that may affect its behavior.

(1) lock block, «», , . blocking. lock «», « ». lock , , , «». — . , , , . — . . .

Several threads can access a shared variable, mutated - changes its value. In fact, we are trying to protect data, not code, from uncontrolled competitive access.

Creating a thread-safe object requires synchronization to coordinate access to a mutated state, failure to fulfill which can lead to data corruption and other undesirable consequences.

Whenever more than one thread accesses a state variable and one of the threads possibly writes to it, all threads must coordinate their access to it using synchronization. Synchronization in Java is provided by the synchronized keyword, which gives exclusive locking, as well as volatile and atomic variables and explicit locks.

Resist the temptation to think that there are situations that do not require synchronization. The program can work and pass its tests, but remain malfunctioning and crash at any time.

If multiple threads access the same variable with a mutated state without proper synchronization, then your program is malfunctioning. There are three ways to fix it:

  • Do not share the state variable in all threads
  • make the state variable non-mutable;
  • use state synchronization every time you access the state variable.

Corrections may require significant design changes, so it is much easier to design a class thread-safe right away than to upgrade it later.

Whether or not multiple threads will access this or that variable is difficult to find out. Fortunately, object-oriented technical solutions that help create well-organized and easy-to-maintain classes — such as encapsulating and hiding data — also help create thread-safe classes. The fewer threads that have access to a particular variable, the easier it is to ensure synchronization and set the conditions under which this variable can be accessed. The Java language does not force you to encapsulate the state - it is perfectly acceptable to store the state in public fields (even public static fields) or publish a link to an object that is otherwise internal - but the better the state of your program is encapsulated,the easier it is to make your program thread safe and help maintainers keep it that way.

When designing thread-safe classes, good object-oriented technical solutions: encapsulation, mutability, and a clear specification of invariants will be your assistants.

If good object-oriented design technical solutions diverge from the needs of the developer, you should sacrifice the rules of good design for the sake of performance or backward compatibility with legacy code. Sometimes abstraction and encapsulation are at variance with performance - although not as often as many developers believe - but the best practice is to make the code right first and then fast. Try to use optimization only if measurements of productivity and needs indicate that you must do it (2) .

(2)In competitive code, you should adhere to this practice even more than usual. Since competitive errors are extremely difficult to reproduce and are not easy to debug, the advantage of a small performance gain on some rarely used code branches can be quite negligible compared to the risk that the program will crash under operating conditions.

If you decide that you need to break encapsulation, then not everything is lost. Your program can still be made thread safe, but the process will be more complicated and more expensive, and the result will be unreliable. Chapter 4 describes the conditions under which encapsulation of state variables can safely be mitigated.

So far, we have used the terms “thread safe class” and “thread safe program” almost interchangeably. Is a thread safe program built entirely from thread safe classes? Optional: a program that consists entirely of thread safe classes may not be thread safe, and a thread safe program may contain classes that are not thread safe. Issues related to the layout of thread-safe classes are also discussed in Chapter 4. In any case, the concept of a thread-safe class makes sense only if the class encapsulates its own state. The term “thread safety” can be applied to the code, but it speaks of the state and can only be applied to that array of code that encapsulates its state (it can be an object or the whole program).

2.1. What is thread safety?


Defining thread safety is not easy. A quick Google search gives you numerous options like these:

... can be called from multiple program threads without unwanted interactions between threads.

... can be called by two or more threads at the same time, without requiring any other action from the caller.

Given such definitions, it is not surprising that we find thread safety confusing! How to distinguish a thread-safe class from an unsafe class? What do we mean by the word "safe"?

At the heart of any reasonable definition of thread safety is the notion of correctness.

Correctness implies that a class conforms to its specification. The specification defines invariants that limit the state of an object and postconditions that describe the effects of operations. How do you know that the specifications for classes are correct? No way, but this does not prevent us from using them after we have convinced ourselves that the code works. So let's assume that single-threaded correctness is something visible. Now we can assume that the thread-safe class behaves correctly during access from multiple threads.

A class is thread safe if it behaves correctly during access from multiple threads, regardless of how these threads are scheduled or interleaved by the working environment, and without additional synchronization or other coordination on the part of the calling code.

A multi-threaded program cannot be thread safe if it is not correct even in a single-threaded environment (3) . If the object is implemented correctly, then no sequence of operations — accessing public methods and reading or writing to public fields — should violate its invariants or postconditions. No set of operations performed sequentially or competitively on instances of a thread-safe class can cause an instance to be in an invalid state.

(3) If the loose use of the term correctness bothers you here, then you can think of a thread-safe class as a class that is faulty in a competitive environment, as well as in a single-threaded environment.

Thread-safe classes encapsulate any necessary synchronization themselves and do not need the help of a client.

2.1.1. Example: servlet without internal state support


In Chapter 1, we have listed the structures that create threads and call components from them that you are responsible for thread safety. Now we intend to develop a servlet factorization service and gradually expand its functionality while maintaining thread safety.

Listing 2.1 shows a simple servlet that decompresses a number from a query, factors it, and wraps the results in response.

Listing 2.1. Servlet without internal state support

@ThreadSafe
public class StatelessFactorizer implements Servlet {
      public void service(ServletRequest req, ServletResponse resp) {
            BigInteger i = extractFromRequest(req);
            BigInteger[] factors = factor(i);
            encodeIntoResponse(resp, factors);
      }
}

The StatelessFactorizer class, like most servlets, has no internal state: it does not contain fields and does not refer to fields from other classes. The state for a particular calculation exists only in local variables that are stored in the stream stack and are available only to the executing stream. One thread accessing StatelessFactorizer cannot affect the result of another thread doing the same, because these threads do not share state.

Objects without internal state support are always thread safe.

The fact that most servlets can be implemented without internal state support significantly reduces the burden of threading the servlets themselves. And only when servlets need to remember something, the requirements for their thread safety increase.

2.2. Atomicity


What happens when a state item is added to an object without internal state support? Suppose we want to add a hit counter that measures the number of requests processed. You can add a field of type long to the servlet and increment it with each request, as shown in UnsafeCountingFactorizer in Listing 2.2.

Listing 2.2. A servlet that counts requests without the necessary synchronization. This should not be done.

image

@NotThreadSafe
public class UnsafeCountingFactorizer implements Servlet {
      private long count = 0;

      public long getCount() { return count; }

      public void service(ServletRequest req, ServletResponse resp) {
            BigInteger i = extractFromRequest(req);
            BigInteger[] factors = factor(i);
            ++count;
            encodeIntoResponse(resp, factors);
      }
}

Unfortunately, the UnsafeCountingFactorizer class is not thread safe, even if it works fine in a single-threaded environment. Like UnsafeSequence, it is prone to lost updates. Although the increment operation ++ count has compact syntax, it is not atomic, that is, indivisible, but a sequence of three operations: delivering the current value, adding one to it and writing the new value back. In the “read, change, write” operations, the resulting state is derived from the previous one.

In fig. 1.1 it is shown what can happen if two threads try to increase the counter at the same time, without synchronization. If the counter is 9, then due to unsuccessful time coordination, both threads will see the value 9, add one to it, and set the value to 10. So the hit counter will start to lag by one.

You might think that having a slightly inaccurate hit counter in a web service is an acceptable loss, and sometimes it is. But if the counter is used to create sequences or unique identifiers of objects, then returning the same value from multiple activations can lead to serious data integrity problems. The possibility of the appearance of incorrect results due to unsuccessful temporal coordination arises in a race condition.

2.2.1. Race conditions


The UnsafeCountingFactorizer class has several race conditions (4) . The most common type of race condition is the “check and then act” situation, where a potentially obsolete observation is used to decide what to do next.

(4) (data race). , . , , , , , . Java. , , . UnsafeCountingFactorizer . 16.

We often encounter a race condition in real life. Suppose you plan to meet a friend at noon at Starbucks Café on Universitetskiy Prospekt. But you will find out that there are two Starbucks on University Avenue. At 12:10 you do not see your friend in cafe A and go to cafe B, but he is not there either. Either your friend is late, or he arrived at cafe A immediately after you left, or he was at cafe B, but went looking for you and is now on his way to cafe A. We will accept the latter, that is, the worst case scenario. Now 12:15, and both of you are wondering if your friend kept his promise. Will you return to another cafe? How many times will you go back and forth? If you have not agreed on a protocol, you can spend the whole day walking along University Avenue in caffeinated euphoria.
The problem with the “take a walk and see if he is there” approach is that a walk along the street between two cafes takes several minutes, and during this time the state of the system can change.

The example with Starbucks illustrates the dependence of the result on the relative time coordination of events (on how long you wait for a friend while in a cafe, etc.). The observation that he is not in cafe A becomes potentially invalid: as soon as you exit the front door, he can enter through the back door. Most race conditions cause problems such as an unexpected exception, overwritten data, and file corruption.

2.2.2. Example: race conditions in lazy initialization


A common trick using the “check and then act” approach is lazy initialization (LazyInitRace). Its purpose is to postpone the initialization of the object until it is needed, and to ensure that it is initialized only once. In Listing 2.3, the getInstance method verifies that the ExpensiveObject is initialized and returns an existing instance, or, otherwise, creates a new instance and returns it after maintaining a reference to it.

Listing 2.3. The race condition is in lazy initialization. This should not be done

image

@NotThreadSafe
public class LazyInitRace {
      private ExpensiveObject instance = null;

      public ExpensiveObject getInstance() {
            if (instance == null)
                instance = new ExpensiveObject();
            return instance;
      }
}

The LazyInitRace class contains race conditions. Suppose that threads A and B execute the getInstance method at the same time. A sees that the instance field is null, and creates a new ExpensiveObject. Thread B also checks to see if the instance field is the same null. The presence of null in the field at this moment depends on the time coordination, including the vagaries of planning and the amount of time required to create an instance of the ExpensiveObject and set the value in the instance field. If the instance field is null when B checks it, two code elements calling the getInstance method can get two different results, even if the getInstance method is supposed to always return the same instance.

The hit counter in UnsafeCountingFactorizer also contains race conditions. The “read, change, write” approach implies that in order to increment the counter, the stream must know its previous value and make sure that no one else changes or uses this value during the update process.

Like most competitive errors, race conditions do not always lead to failure: temporary coordination is successful. But if the LazyInitRace class is used to instantiate the registry of the entire application, then when it will return different instances from multiple activations, registrations will be lost or actions will receive conflicting representations of the set of registered objects. Or if the UnsafeSequence class is used to generate entity identifiers in a data conservation structure, then two different objects can have the same identifier, violating identity restrictions.

2.2.3. Compound actions


Both LazyInitRace and UnsafeCountingFactorizer contain a sequence of operations that must be atomic. But to prevent a race condition, there must be an obstacle for other threads to use the variable while one thread modifies it.

Operations A and B are atomic if, from the point of view of the thread performing operation A, operation B was either entirely performed by another thread or not even partially performed.

The atomicity of the increment operation in UnsafeSequence would avoid the race condition shown in Fig. 1.1. The operations “check and then act” and “read, change, write” should always be atomic. They are called compound actions — sequences of operations that must be performed atomically in order to remain thread safe. In the next section, we will consider locking - a mechanism built into Java that provides atomicity. In the meantime, we will fix the problem in another way by applying the existing thread-safe class, as shown in the Countingfactorizer in Listing 2.4.

Listing 2.4. Servlet counting requests using AtomicLong

@ThreadSafe
public class CountingFactorizer implements Servlet {
      private final AtomicLong count = new AtomicLong(0);

      public long getCount() { return count.get(); }

      public void service(ServletRequest req, ServletResponse resp) {
            BigInteger i = extractFromRequest(req);
            BigInteger[] factors = factor(i);
            count.incrementAndGet();
            encodeIntoResponse(resp, factors);
      }
}

The java.util.concurrent.atomic package contains atomic variables for managing class states. Replacing the counter type from long to AtomicLong, we guarantee that all actions that refer to the state of the counter are atomic1. Since the state of the servlet is the state of the counter, and the counter is thread safe, our servlet becomes thread safe.

When a single state element is added to a class that does not support internal state, the resulting class will be thread safe if the state is completely controlled by the thread safe object. But, as we will see in the next section, the transition from one state variable to the next will not be as simple as the transition from zero to one.

Where convenient, use existing thread-safe objects, such as AtomicLong, to control the state of your class. Possible states of existing thread-safe objects and their transitions to other states are easier to maintain and check for thread-safety than arbitrary state variables.

»More information about the book can be found on the publisher’s website
» Contents
» Excerpt

For Khabrozhiteley 25% discount on coupon - Java

After the payment of the paper version of the book, an electronic book is sent by e-mail.

All Articles