⭐️ 💇🏾 🐝 Investigation: what is higher than thread priorities in Windows? 🥃 👩🏼‍🤝‍👨🏾 🐥

This investigation, like many others, began with the fact that I was doing my own business, not trying to look for problems for myself. This time, all I did was open the lid of the laptop and tried to log into the system.

For the first few times, when this resulted in a delay of twenty seconds, I ignored the problem, hoping that it would resolve itself. The next few times I thought about the investigation, but performance problems that arise even before you logged in are more difficult to solve, and I was lazy.

When I noticed that I was avoiding closing the laptop because I was afraid of these too frequent delays, I realized that it was time to do this seriously.

Fortunately, I recently fixed the UIforETW ring buffer tracemaking it reliable, so I started it and started to wait for the next delay event. I did not have to wait long.

It took me several times to get the ETW trace completely fine with me . And since this territory was unfamiliar to me, it took some time to figure out what was happening. I still did not fully understand the problem, but 90% understood the reasons for its occurrence. I managed to learn a lot, including some new details about the Windows scheduler, and I also found an absolutely effective solution.

The ideal trace that I eventually recorded when loading into Microsoft Windows Performance Analyzer (WPA) looks like this:

Standard events, windows in focus and CPU usage.

This table and two graphs contain a ton of information. The top table ( Generic Events ) shows the recorded keystrokes for UIforETW. I tried to press a key (virtual key code 162) once per second until a password input field appears. Since these 17 keystrokes are selected, in the graph below they are shown with vertical blue lines for simplified visualization of the execution time of critical events. The x-axis represents the time in seconds.

The horizontal bars in the upper graph ( Window in Focus ) show which process has focus during this time. There are six different processes in total. The Tracing off period is the short time during which the laptop was closed.

The bottom graph shows CPU usage . Information is obtained from context switching data, therefore it must be completely accurate and complete. In this trace, a value of 100% indicates the moment when all eight logical processors of my four-core eight-thread notebook were used.

Having received the trace data, I had to figure out what my laptop secretly does when the cover is closed and until the moment I return to the system.

Storm before the lull

As we can see, the laptop at the beginning of the trace of the laptop is relatively simple, as it should be. Then I closed its lid. This seems to have caused a spike in CPU activity and a change in the focus of windows. Window in Focus changed from UIforETW to Idle, then to csrss, back to Idle, to LogonUI, and then back to Idle. Who would have thought?

During this interval, the laptop performed approximately 17 seconds of CPU processing of various types. Part of it is the work needed to shut off. Part - these are programs (including internal Google tools) that are registered in the Task Scheduler for the execution of “When a user locks a workstation” - it makes sense. I even noticed that work is being done to create UI elements for logging in when the user continues to work - you need to be prepared in advance, right?

17 seconds of CPU - quite a long time for the laptop to go to sleep. Even on my laptop with four cores and eight threads, the process takes more than four seconds. On my home laptop, it takes more than 13 seconds of CPU time to fall asleep, and almost all of them go to Windows code. Does the diagnostic policy service really need to run a couple of SruDbTableSearches before the laptop can rest?

I think this excessive work when going to sleep is also a problem, but this is not the very problem I am looking for. So I just decided to turn my back on her.

And only much later I realized that it was during this time that the grains of destruction of my bug were thrown ...

Sleep

After blocking the laptop, there is no CPU activity. In this particular test, the laptop was locked for about 16 seconds.

Convulsive awakening

The activity of the CPU upon transition to sleep is incomparable with when it began to awaken. During this time, my overloaded laptop took about 172 seconds of CPU time (!!!) for 22.6 seconds. This is a lot of work.

One of the mysteries of this process is the drop in CPU usage to almost zero about a second after the initial outburst of activity. This short period of downtime seems rather abnormal, given the chaos surrounding it. But I think that this feature is not related to the problem, so I did not pay attention to it.

Another mystery is why so manyprograms come to life after this brief pause. It's funny that the most serious intruder responsible for 31.6 out of 172 seconds of the CPU was Windows Performance Analyzer (WPA) - the very program that I use to analyze traces. The three copies I left running are working hard on rendering my UI, even though it is not yet visible.

In addition, dark patterns occur when trying to initialize laptop devices. KeStallExecutionProcessor is a waiting loop, and it was strange to see that this is the most executable function of the entire system. Is a second-odd wait cycle the only way to start equipment? Is it really necessary to spend 700 ms CPU time initializing the mouse and keyboard ? Should Microsoft and Intel ignore Microsoft's recommendation ona maximum of 50 microseconds ?

Drivers of a waiting cycle. i8042prt.sys is written by Microsoft. The following two are created by Intel.

Ultimately, many programs are actively running during this time . Most of them seem to be facing the same problem as WPA - they are desperate to draw pixels on a hidden screen, and this alludes to a Windows bug. But even without this bug explorer.exe and other programs actively seek something to do. But in the end, although this excessive CPU usage is a necessary part of the problem, it is not the problem itself . So again I stopped paying attention to her.

Focus

When analyzing traces, it is important to find out when important actions occur. The main evidence was input events, because I stopped clicking on control after the password input form appeared. Here are the last three keystrokes of the control key in an approximate form on the Window in Focus chart :

It seems that the critical events are getting the focus of LockApp.exe, after which the focus gets LogonUI.exe almost instantly. Presumably, I entered the password in LogonUI.exe (it is convenient that the trace did not intercept keyboard events), after which the focus briefly switched to explorer, and then to UIforETW, from which I started.

It also looks like LogonUI.exe cannot get focus before LockApp.exe - this pattern repeats itself in all the traces I studied.

So, after more than a thousand words dedicated to solving this riddle, we finally have a clear question that we can investigate: why does LockApp.exe get focus after exiting the downtime, it takes twenty seconds?

We have a question? Great, let's answer it

Using CPU Usage (Precise) data obtained from content switching, I quickly found that within twenty seconds after waking LockApp.exe received less than one millisecond of CPU time, and for more than 14 seconds (from 35.158 s to 49.827 s) did not work generally:

LockApp does not work at all for a long time

The documentation on the meaning of the columns in the CPU Usage (Precise) tables is here .

If a process or thread has not been running for some time, and you want to find out why, usually important clues can be found in the first context switch after a long lull, namely switching to 49.827 seconds of tracing. I reordered the columns to show more data from this context switch:

LockApp is prepared but not executed. Strange ...

Count, equal to 1 means that we look at the data for a single context switch.

Time Since Last, equal to 38.2 million microseconds, means that this thread will not execute within 38.2 seconds. This in itself is neither good nor bad. Idle flows save energy, and in the end the laptop was in a dream for some time.

Switch-In Time simply tells us when exactly the thread fits in the CPU — when the context switches to that thread.

And now we go to the Ready column. He tells us how long the thread was ready to execute, but not executed. In other words, this thread was waiting for something (lock, handle) and this is somethingwas freed or initiated, but the thread still did not run for 19.493 seconds.

To better understand the Ready (us) column , you can take a look at the Ready Time (s) column . He tells us when the stream is prepared. We see that for 30.333 seconds of tracing, this thread was prepared for execution, but did not execute until 49.827 seconds. This seems to be important.

This arrangement of columns otherwise shows us the same context switch:

New Thread Stack and Ready Thread Stack

So, this thread (which the New Thread Stack expected NtWaitForWorkViaWorkerFactory to show) was ordered to wake up (the system process calling KeSetEvent) shortly after I opened the notebook lid for 30.333 seconds of tracing. But it started not then (which would be “good”), but after 19.494 s, and this is bad.

Typically, when conducting such an analysis of expectations, I spend a lot of time figuring out why the stream is waiting and what caused it to not be ready. But this was the first time I was doing an analysis of expectations, in which it was not important, and the question was why this ready-made thread is not executed.

Cases ...

Most people don’t spend so much time studying ETW traces, so an explanation is needed here. This is very strange. If the thread is ready, then it usually starts instantly, or after a few milliseconds. The readiness of the stream, as the name implies , means that the stream is ready for execution and almost nothing can interfere with it. But let's figure out what can prevent the execution of a finished thread.

Thread priority

At first I suggested that this is a simple case of CPU “hunger”. Dozens of processes require CPU time, and because of this, LockApp does not get the right one until the load decreases. However, this theory does not quite correspond to the symptoms, because LockApp process could take about 18 seconds even without getting CPU time.

The CPU hunger theory is good because it is verifiable. I managed to increase the priority of the LockApp process using the Task Manager (during one of the brief periods when it was not suspended by the UWP system), therefore, in the final trace that I used for this post, LockApp was executed with high priority. A regular Windows thread runs with a priority of about 8-10. The highest priority with which a regular (non-real-time) Windows thread can run is 15. My ETW traces showed that LockApp always worked with priority 13 or higher.

Here's a CPU timeline for critical 19.494 seconds, grouped and colored by priority of thread ( New In Pri, the current priority that was assigned to the thread). We see that threads with priorities 4, 8, 9, and 10 consume the vast majority of CPU time, especially at the end:

Using CPU by Priority

Here is another image with hidden threads with priorities 0-12. Each time the graph drops below 12.5% (which means one logical processor of the CPU time of my eight-thread notebook), LockApp must be launched, and it becomes absolutely unbelievable that priority prevents it from being executed so often when many threads with lower or equal priority get a ton of time.

Priority CPU usage, high priority threads only

Eliminate priority inversion

There is speculation that Windows priority inversion algorithms are so conducive to other threads that LockApp.exe is blocked. But since the graphs shown above demonstrate that true priorities are used in planning decisions, this assumption (always unconvincing) will have to be abandoned.

Stack core unloading

When I talked about this puzzle on Twitter, one of the commentators suggested that the thread core stack was unloaded . I was not familiar with this situation, but after John Werth 's explanations (he understands in his field) I turned off the swapping of the kernel stack and rebooted the computer. Nothing changed. In fact, I did not think that this would help, given that I have 32 GB of memory, and the problem occurs repeatedly and often; but it was better to be sure of this.

Pause process

Since LockApp is a modern UWP application, it is subject to restrictions similar to those of smartphone apps. Among other things, this means that it can be suspended when not in the foreground, and then “unfreeze” when returned to the foreground. James Forshaw proposed recording Microsoft-Windows-Kernel-Process ETW to get data on this.

Events are designed to cause maximum confusion. The name of the task Process Freeze is used both for “defrosting” and for “freezing”, and the version of the win: Stop event means that the process is starting (stopped freezing), and the version of win: Startmeans that the process stops (starts to freeze). All this is extremely logical, but very confusing. If the event names were divided into Freeze and Thaw, then there would be less confusion.

There is no documentation for these events, but thanks to the analysis, I determined that these events are always created by the Background Tasks / Broker Infrastructure Service . The name and process ID of the corresponding process are indicated in the FrozenProcessID field.

ProcessFreeze Events (also used for defrosting) It

was interesting to investigate this provider - it has many promising events - but in the end it turned out that LockApp did not pause or defrost during tracing. However, this provider seemed quite useful, so I modified UIforETW so that future versions always wrote it down.

We have already ruled out everything

None of the theories described above seemed to me very likely, and now we have all excluded them. I began to look for help, and asked me to give me ideas from a friend from Microsoft. And at that moment I discovered that the flow priority 0-31 so well known in Windows is actually just five low priority bits of a full priority system.

Use of official position

It turned out that my ignorance was my own fault. If I carefully read all 108 pages of the Threads section of Windows Internals, 7th Edition, Part 1 , I would understand what was happening. If you want to jump ahead, then this topic is revealed on pages 287 to 295 .

This super-priority field that I did not know about is called Rank . It appears in WPA as a default hidden column (to find it, you have to open the View Editor) called NewThreadRank . When planning threads, Thread Rank has priority over priority. Almost all streams have Rank 0, and a stream with Rank 0 always has a higher priority than a stream with Rank 2. By including a columnNewThreadRank and looking at the left side of the table, we can immediately see the problem:

Rank is more important than priority

. LockApp.exe streams have Rank 2, which means that, despite priority 14, they have the lowest priority in the system.

An almost complete explanation

Since it turned out that LockApp.exe threads have Rank 2, they can only be executed when none of the threads with Rank 0 “want” to run. Since many applications (for unknown reasons) actively render their invisible screens, they fight for every crumb of CPU time, leaving nothing for higher ranks. Once LockApp.exe receives a tiny fraction of the CPU time, it quickly moves to Rank 0 (and the CPU load drops), after which the login process is performed in the usual way.

Having learned this information, I began to study how LockApp’s rank changes over time. In the last few seconds before going to sleep, LockApp suddenly moved from rank 0 to 2. The rank is designed to prevent the CPU from taking up too much time, such as when Windows Photos is too keen on unwanted background processing and makes the transition from rank 2 to 19:

Microsoft.Photos goes down the rank

From the documentation you can understand that the main purpose of the stream rank is the fair sharing of CPU time between sessions on the machine so that the processes of one user do not harm others. Both of these options for using the rank make it clear that the rank of the stream should only increase if it uses a lot of CPU time, and when the laptop went to sleep, LockApp.exe used only 79.3 ms of CPU time, and the rest of the system - 17 from the CPU time . Nevertheless, the OS for some reason decided to downgrade LockApp to 2 in the process of going to sleep.

The OS changes the rank of the stream only if it belongs to the “planning group” ( KSCHEDULING_GROUP), and most threads in a typical Windows installation are not members. Consequently, most threads are not subject to a change in rank, so they can spend CPU time the way they want.

Remaining puzzles

Unfortunately, it is still unclear why LockApp.exe drops to Rank 2 before turning on sleep. I will assume that LockApp is in the planning group, and probably one of the algorithms behaves incorrectly. But I could not find an API to investigate this, and time was running out. If you know any details, then write in the comments to the original article. The very principle of using rank as the most important component in planning decisions should, it seems to me, inevitably break down if most of the processes in the system are not involved in it - threads in planning groups always run the risk of being left without the necessary resources. Dynamic Resource Allocation Planning ( DFSS ) is doomed to fail if most threads are not involved.

Also, I do not know why so many applications remain active after going to sleep. This is usually explained by the fact that “many timers end when the laptop is in sleep mode for several hours,” but this explanation is not suitable if the laptop was in a dream for only a few seconds, and the WPA rendering behavior indicates that something happens in the window system something wrong. Add to that bad-behavior applications and wait-cycle drivers, and everything is stacked over time by the CPU.

The fact that CPU storms die down and LockApp start up at the same time leads to an obvious explanation: LockApp can only work when the CPU demand drops. But there is an equally convincing explanation: as soon as LockApp gets the ability to run (or, possibly, LogonUI gets it), the CPU demand drops. Both explanations work, but I think the second is more plausible, because otherwise we cannot explain why the seemingly endless rendering of WPA suddenly stops.

Solution to the problem

As soon as I realized that LockApp.exe is a separate application that has problems with launching, and that raising its priority does not help, I disabled it. The file DisableLockScreen.reg helped me with this:

Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows\Personalization]
“NoLockScreen”=dword:00000001

By turning off the lock screen, the laptop wakes up immediately after opening the cover. I did not notice either braking or storms of the CPU, and now it takes one step less to enter.

The first twitter post I posted when I first encountered the problem contains a timeline for an investigation that may be useful to someone. In addition, a lot of smart people from twitter came to the post, thanks to them.

When I returned to the article, I found out that after turning the lock screen back on, the problem disappeared. A simple reboot did not fix it - in February I rebooted many times, but we probably won’t know why it was lost.

Investigation: what is higher than thread priorities in Windows?