⏬ 🐺 🧘🏽 Making SQL Server and PerfMon Friends 📁 📉 ⚰️

In the photo there is a scene from the good old comedy “Airplane” - the boomers remember it.

She is not here by chance. But useful later.

Dba don't like perfmon

It’s not at all because the PerfMon interface has not changed with Windows NT 3.1 (?) And gives back the warmth of two thousandths. By the way, maybe someone knows the explanation for this strange fact. The control panel is rewritten several times a year. Even the calculator was rewritten. But not perfMon.

DBA thinks with cover words. When a good DBA doctor arrives at a sick SQL server, he opens his suitcase, and in the suitcase there is a huge number of quarters that serve him faithfully. He takes the shortest of them from memory. Sometimes a DBA can look into PerfMon's metrics - Physical disk, or Disk Queue Length. However, a real DBA will get its own SQL server metrics, not through PerfMon, but with a patch, for example, like this:

select * from sys.dm_os_performance_counters 
  where object_name='SQLServer:Buffer Manager' 
    and counter_name like 'Page life expectancy%' 
	and instance_name=''

When IT representatives, divorced from the DBA sphere, ask him, “what metrics do you want us to collect?” (well, there are all sorts of splunk, squared-up, etc.), then the DBA begins to think about scripts, and the rest about perfmon metrics. At the same time, as a rule, they try to dump all the metrics on the DBA. When this happens, the picture is useless in its congestion. And then a short video clip from the movie "Airplane" will help me .

As a result, DBA writes keri, which write statistics every N minutes to some plate in the DBAtasks database. Do you recognize yourself?

Advocate Devil (perfmon)

Nevertheless, perfmon counters are a standard tool. They can be logged, set alerts, but most importantly, there are a large number of systems that "merge" these metrics from many machines into a centralized repository, allow you to analyze them and build beautiful (and not like perfmon) charts.

Turns out we can make friends with the SQL world and PerfMon world!

Below I will show how.

Now the specifics will go

As an example, we will show how to export metrics that SQL itself does not publish to PerfMon. For example, the query below:

select
  convert(numeric(10,2),round(((sum(version_store_reserved_page_count)*1.0)/128.00),2)),
  convert(numeric(10,2),round(((sum(user_object_reserved_page_count)*1.0)/128.00),2)),
  convert(numeric(10,2),round(((sum(internal_object_reserved_page_count)*1.0)/128.00),2))
from
  tempdb.sys.dm_db_file_space_usage;

returns the amount of tempdb space used by the version store (for snapshot), user objects (#tab and ## tab) and sort / temporary storage space. Let's create metrics for them (18+, teams are undocumented)

dbcc addinstance ('SQLServer:User Settable', 'TempDB version store KB')
dbcc addinstance ('SQLServer:User Settable', 'TempDB user store KB')
dbcc addinstance ('SQLServer:User Settable', 'TempDB sort store KB')

This will create three metrics in SQLServer -> UserSettable -> Query. The metrics are integers - int (not bigint). In addition, their meaning is interpreted as is. That is, if you need a delta from the previous value, then this is your concern. But in this case, we just need to assign values to the metrics:

declare @TEMPDBver int, @TEMPDBuser int, @TEMPDBsort int
select

@TEMPDBver = convert(numeric(10,2),round(((sum(version_store_reserved_page_count) *1.0)/128.00),2)),
@TEMPDBuser = convert(numeric(10,2),round(((sum(user_object_reserved_page_count) *1.0)/128.00),2)),
@TEMPDBsort = convert(numeric(10,2),round(((sum(internal_object_reserved_page_count) *1.0)/128.00),2))
from tempdb.sys.dm_db_file_space_usage;

if @TEMPDBver is not null    
  dbcc setinstance ('SQLServer:User Settable', 'Query', 
    'TempDB version store KB', @TEMPDBver)
if @TEMPDBuser is not null   
  dbcc setinstance ('SQLServer:User Settable', 'Query', 
   'TempDB user store KB', @TEMPDBuser)
if @TEMPDBsort is not null   
  dbcc setinstance ('SQLServer:User Settable', 'Query', 
    'TempDB sort store KB', @TEMPDBsort)

Actually, that's all. Now just call this code regularly and everything will work.

Delta Counters

Now let's make a delta counter. This will be the lock timeout in ms.

dbcc addinstance ('SQLServer:User Settable', 'LOCK ms per s')

Here you can object to me that such a metric already exists:

select * from sys.dm_os_performance_counters 
  where counter_name like 'Lock Wait Time (ms)%' 
    and instance_name='_Total'
    and object_name='SQLServer:Locks'

But this metric is only eventually consistent. As you remember, if protons are unstable, then any database in the universe is eventually consistent. That is, if you create a lock and look at the value of this metric, then it will not grow! And only at the end of the lock its value will jump sharply for the full waiting time. On the graph, instead of the expected 'plateau', you will get a sharp peak, which also spoils your scale along the Y axis.

The same problem is with kvery:

select sum(wait_time_ms) 
  from sys.dm_os_wait_stats 
  where wait_type like 'LCK_%'

Therefore, we must add the time of the current ones to the time of the 'completed' locks:

declare @NEWlock bigint
select @NEWlock=sum(wait_time_ms) 
  from sys.dm_os_wait_stats 
  where wait_type like 'LCK_%' -- finished waits
select @NEWlock=@NEWlock+isnull(sum(waittime),0) 
  from master.dbo.sysprocesses 
  where blocked>0 and lastwaittype like 'LCK_%' -- waits in progress

This querie already gives the correct plateau on the chart.

Now we need to consider the delta from the previous value. There are two ways:

Design a process as a job that is called frequently (say, once per minute). Create a label to store previous values. In fact, it is often bad to call job (overhead + competition for the execution history of Jobs), and you can do without an additional plate:
Create a process that runs in an endless loop with waitfor. The process remembers the previous value in the variable. We will process the process as a job with two schedule: at startup and once an hour (in case job falls, then it will restart)

In the second case, it is quite possible to do a cycle every five seconds or even more often. So, we get:

declare @OLDlock bigint, @NEWlock bigint
declare @lock int, @seconds int = 5 -- must match WAITFOR
loop:
  set @OLDlock=@NEWlock -- shift new to old
  select @NEWlock=sum(wait_time_ms) 
    from sys.dm_os_wait_stats 
	where wait_type like 'LCK_%' -- finished waits
  select @NEWlock=@NEWlock+isnull(sum(waittime),0) 
    from master.dbo.sysprocesses 
	where blocked>0 and lastwaittype like 'LCK_%' -- waits in progress
  set @lock=(@NEWlock-@OLDlock)/@seconds -- this is delta
  if @lock is not null    
    dbcc setinstance ('SQLServer:User Settable', 'Query', 
	  'LOCK ms per s', @lock)
  waitfor delay '00:00:05' 
goto loop

Please note that the first time you run a DBCC loop, it will be skipped (which is correct), since @OLDlock (and lock) will be null.

Making SQL Server and PerfMon Friends

Dba don't like perfmon

Advocate Devil (perfmon)

Now the specifics will go

Delta Counters

More articles: