varchar (max) -varchar (max) and in production

I recently participated in a discussion on the impact on performance of specifying lengths in nvarchar columns. The arguments were reasonable on both sides, and since I had free time, I decided to test a little. The result was this post.

Spoiler - not so simple.

All tests were conducted on SQL Server 2014 Developer Edition, approximately the same results were obtained on SQL Server 2016 (with slight differences). The following should be relevant for SQL Server 2005-2016 (and testing is required in 2017/2019, as Adaptive Memory Grants have appeared there, which can somewhat correct the situation).

We need a stored procedure from Erik Darling sp_pressure_detector , which allows you to get a lot of information about the current state of the system andSQL Query Stress is a very cool open-source utility Adam Machanic / Erik Ejlskov Jensen for load testing MS SQL Server.

What are we talking about


The question I'm trying to answer is whether the choice of the length of the (n) varchar field affects the performance (hereinafter simply varchar everywhere, although everything is also relevant for nvarchar), or you can use varchar (max) and not steam, because if the string length is <8000 (4000 for nvarchar) characters, then varchar (max) and varchar (N) are stored IN-ROW.

Cooking stand


create table ##v10  (i int, d datetime, v varchar(10));
create table ##v100 (i int, d datetime, v varchar(100));
create table ##vmax (i int, d datetime, v varchar(max));

We create 3 tables of three fields, the difference is only in the length of varchar: 10/100 / max. And fill them with the same data:

;with x as (select 1 x union all select 1)
, xx as (select 1 x from x x1, x x2)
, xxx as (select 1 x from xx x1, xx x2, xx x3)
, xxxx as (
	select row_number() over(order by (select null)) i
		, dateadd(second, row_number() over(order by (select null)), '20200101') d
		, cast (row_number() over(order by (select null)) as varchar(10))  v 		
	from xxx x1, xxx x2, xxx x3
) --262144 
insert into ##v10			--varchar(10)
select i, d, v from xxxx;	

insert into ##v100			--varchar(100)
select i, d, v from ##v10;

insert into ##vmax			--varchar(max)
select i, d, v from ##v10;

As a result, each table will contain 262144 rows. Column I (integer) contains non-repeating numbers from 1 to 262145; d (datetime) unique dates and v (varchar) - cast (I as varchar (10)). To make it a bit more like real life, create a unique cluster index on i:

create unique clustered index #cidx10 on ##v10(i);
create unique clustered index #cidx100 on ##v100(i);
create unique clustered index #cidxmax on ##vmax(i);

Go


First, let's see the plans for the execution of different requests.

First, check that the selection by varchar field does not depend on its length (if <8000 characters are stored there). We include a valid execution plan and look:

select * from ##v10 where v = '123';
select * from ##v100 where v = '123';
select * from ##vmax where v = '123';


Oddly enough, the difference, although small, is there. The query plan with varchar (max) first selects all the rows and then filters them out, and varchar (10) and varchar (100) check for matches when scanning the clustered index. Because of this, scanning takes almost 3 times longer - 0.068 seconds versus 0.022 for varchar (10).

Now let's see what happens if we simply display the varchar column and select data by the cluster index key:

select * from ##v10  where i between 200000 and 201000;
select * from ##v100 where i between 200000 and 201000;
select * from ##vmax where i between 200000 and 201000; 


Everything is clear here - there is no difference for such requests.

Now for the interesting part. In the previous request, we got only 1001 rows, and now we want to sort them by non-indexed column. We try:

select * from ##v10  where i between 200000 and 201000 order by d;
select * from ##v100 where i between 200000 and 201000 order by d;
select * from ##vmax where i between 200000 and 201000 order by d;


Oh, and what's so yellow?


Funny, i.e. the request requested and received 6.5 megabytes of RAM for sorting, and used only 96 kilobytes. And how much worse will it be if there are more lines. Well, let it be not 1000, but 100000:


But here it’s more serious. Moreover, the first request that works with the smallest varchar (10) is also dissatisfied with something:


On the left there is a warning of the last request: 500 megabytes were requested, and only 9.5 megabytes were used. And on the right is a sorting warning: 8840 kilobytes were requested, but there weren’t enough of them, and another 360 pages (8 kb each) were written and read from tempdb.

And here the question begs: WTF?

The answer is how SQL Server Query Optimizer works. To sort something, you need to first put something into memory. How to understand how much memory is needed? In general, we know how much what type of data takes up space. But what about variable-length strings? But it’s more interesting with them. When allocating memory for / hash join sorts, SQL Server considers that they are half full on average. And allocates memory for them as (size / 2) * the expected number of lines. But varchar (max) can store as much as 2GB - how much to allocate? SQL Server believes that there will be half of varchar (8000) - i.e. approximately 4 kb per line.

What is interesting - this allocation of memory leads to problems not only with varchar (max) - if the size of your varchars is lovingly chosen so that most of them are half full and larger, this also leads to problems. Problems of a different plan, but no less serious. In the figure above there is a description - SQL Server could not correctly allocate memory for sorting a small varchar and used tempdb to store intermediate results. If tempdb lies on slow disks, or is actively used by other requests, this can become a very bottleneck.

SQL Query Stress


Now let’s see what happens when bulk queries are executed. Run SQL Query Stress, connect it to our server, and say to execute all these queries 10 times in 50 threads.

The results of the first query:




It is interesting, but without indexes, when searching, varchar (max) shows itself worse than anyone, and it is solidly worse in terms of processor time for iteration and overall runtime.

sp_pressure_detector does not show anything interesting here, so I don’t cite its output.
Results of the second query:




Everything is expected here - equally good.

Now for the interesting part. A query with sorting the resulting thousand rows:




Everything turned out to be exactly the same as with the previous request - there are not many lines, sorting does not cause problems.

Now the last query that sorts unreasonably many rows (I added top 1000 to it so as not to pull the entire sorted list):




And here is the output of sp_pressure_detector:



What does he tell us? All sessions request 489 MB each (for sorting), but only 22 of them had enough memory for SQL Server, even considering that all these 22 sessions use only 9 MB each!
In total, 11 GB of memory is available, 229 sessions were allocated 489.625 each and SQL Server had only 258 megabytes available, and they also want to get new sessions for 489. What should I do? Wait until the memory becomes free - they wait, not even starting to run. What will users do if such requests are made in their sessions? Too to wait.

By the way, pay attention to the figure with varchar (10) - requests with varchar (10) took longer than requests with varchar (100) - and this despite the fact that I have tempdb on a very fast disk. The worse the drive under tempdb is, the slower the query will run.

Separate Note for SQL Server 2012/2014
SQL Server 2012/2014 sort spills. char/nchar – spill’ tempdb. MS , , .

:

create table ##c6  (i int, d datetime, v char(6));
insert into ##c6 (i, d, v)
select i, d, v
from ##v10
select * from ##c6 where i between 100000 and 200000 order by d;


( ):

DBCC TRACEON (7470, -1);


, spill’ .

findings


Use caution with sorting in your queries where you have (n) varchar columns. If sorting is still needed, it is highly desirable that the sorting column has an index.

Note that in order to get a sort it is not necessary to explicitly use order by - its appearance is also possible with merge joins, for example. The same problem with memory allocation is also possible with hash joins, for example, with varchar (max):

select top 100 * 
from ##vmax v1
inner hash join ##v10 v2 on v1.i = v2.i


2.5 GB of memory allocated, 25 megabytes used!

The main conclusion for me : the size of the column (n) varchar - IMPORTANT! If the size is too small, spills in tempdb are possible; if too large, memory requests are too large. If there are sorts, it would be wise to declare the varchar length as the average record length * 2, and in the case of SQL Server 2012/2014 - even more.

Unexpected conclusion for me : varchar (max), containing less than 8000 characters, actually works more slowly, with filters on it. I don’t know how to explain it yet - I’ll dig more.

Bonus withdrawal for me: already almost clicking “publish”, I thought that even with varchar (max) you can experience the problem of “small varchar'a”. Indeed, when stored in varchar (max) more than 4000 characters (2000 for nvarchar) - sorting can be a problem.

insert into ##vmax(i, d, v)
select i, d, replicate('a', 4000) v
from ##v10;

select * from ##vmax where i between 200000 and 201000 order by d;


truncate table ##vmax;

insert into ##vmax(i, d, v)
select i, d, replicate('a', 4100) v
from ##v10;

select * from ##vmax where i between 200000 and 201000 order by d;


Why at the very beginning I wrote that not everything is so simple? Because, for example, on my home laptop with a half-dead disk, spills in tempdb when sorting “small” varchar led to the fact that such requests were executed ORDER slower than similar requests with varchar (max). If you have good hardware, they may not be such a problem, but you should not forget about them.

What would be even more interesting is to see if there are any problems due to the too large / small sizes of varchars in other DBMSs. If you have the opportunity to check - I will be glad if share.

Little bonus


Unfortunately, it is not possible to catch such problems using the query plan cache. Here are examples of plans from the cache: alas, there are no warnings in them.



All Articles