Methods for optimizing LINQ queries in C # .NET

Introduction


In this article we discussed some optimization techniques LINQ-inquiries .
Here are some more code optimization approaches related to LINQ queries .

It is known that LINQ (Language-Integrated Query) is a simple and convenient language for querying a data source.

And LINQ to SQL is a data access technology in a DBMS. This is a powerful tool for working with data, where queries are constructed through a declarative language, which will then be converted into SQL queries by the platform and sent to the database server already for execution. In our case, by DBMS we mean MS SQL Server .

However, LINQ queries are not converted to optimally written SQL queries that an experienced DBA could write with all the nuances of optimizing SQL queries :

  1. optimal connections ( JOIN ) and filtering of results ( WHERE )
  2. many nuances in the use of compounds and group conditions
  3. many variations in replacing IN conditions with EXISTS and NOT IN , <> with EXISTS
  4. intermediate caching of results through temporary tables, CTE, table variables
  5. using a clause ( OPTION ) with instructions and table hints WITH (...)
  6. the use of indexed views, as one of the means to get rid of redundant readings of data in samples

The main performance bottlenecks of the resulting SQL queries when compiling LINQ queries are:

  1. consolidation of the entire data selection mechanism in one request
  2. duplication of identical blocks of code, which ultimately leads to multiple extra readings of data
  3. groups of multicomponent conditions (logical "and" and "or") - AND and OR , combining in difficult conditions, leads to the fact that the optimizer, having suitable nonclustered indexes, by the necessary fields, eventually starts to scan by the cluster index ( INDEX SCAN ) by condition group
  4. deep nesting of subqueries makes it very problematic to parse SQL statements and parse query plans from developers and DBAs

Optimization methods


Now we pass directly to the optimization methods.

1) Additional indexing


It is best to consider filters on the main sampling tables, since very often the entire query is built around one or two main tables (applications-people-operations) and with a standard set of conditions (IsClosed, Canceled, Enabled, Status). It is important for the identified samples to create the corresponding indexes.

This solution makes sense when choosing from these fields significantly limits the returned set to the query.

For example, we have 500,000 applications. However, there are only 2,000 active entries. Then a correctly selected index will save us from INDEX SCAN over a large table and will allow us to quickly select data through a non-clustered index.

Lack of indexes can also be detected through prompts for parsing query plans or collecting statistics for system viewsMS SQL Server :

  1. sys.dm_db_missing_index_groups
  2. sys.dm_db_missing_index_group_stats
  3. sys.dm_db_missing_index_details

All view data contains information about missing indexes, with the exception of spatial indexes.

However, indexes and caching are often methods of dealing with the effects of poorly written LINQ queries and SQL queries .

As the harsh practice of life shows for the business, it is often important to implement business features by a certain date. And therefore, often heavy queries are put in the background with caching.

This is partly justified, since the user does not always need the latest data and an acceptable level of user interface response occurs.

This approach allows you to solve business needs, but ultimately reduces the efficiency of the information system, simply delaying the solution of problems.

It is also worth remembering that in the search process necessary to add new indexes, the MS SQL optimization proposals may be incorrect, including under the following conditions:

  1. if indexes with a similar set of fields already exist
  2. if the fields in the table cannot be indexed due to indexing restrictions (more on this is described here ).

2) Merging attributes into one new attribute


Sometimes some fields from the same table by which a group of conditions occurs can be replaced by the introduction of one new field.

This is especially true for state fields, which by type are usually either bitwise or integer.

Example:

IsClosed = 0 AND Canceled = 0 AND Enabled = 0 is replaced by Status = 1 .

Here you enter the integer attribute Status, which is provided by filling these statuses in the table. The next step is indexing this new attribute.

This is a fundamental solution to the performance problem, because we are asking for data without unnecessary calculations.

3) Submission materialization


Unfortunately, LINQ queries cannot directly use temporary tables, CTEs, and table variables.

However, there is another way to optimize for this case - this is indexed views.

A group of conditions (from the example above) IsClosed = 0 AND Canceled = 0 AND Enabled = 0 (or a set of other similar conditions) becomes a good option for using them in an indexed view, caching a small slice of data from a large set.

But there are a number of limitations when materializing a view:

  1. using subqueries, EXISTS clauses should be replaced using JOIN
  2. Can not use UNION , UNION ALL , EXCEPTION , INTERSECT clauses
  3. you cannot use table hints and OPTION clauses
  4. no ability to work with cycles
  5. it is impossible to display data in one view from different tables

It is important to remember that the real benefits of using an indexed view can actually be obtained only by indexing it.

But when invoking a view, these indices may not be used, and to use them explicitly, you must specify WITH (NOEXPAND) .

Since it is impossible to define table hints in LINQ queries , so you have to make another representation - a “wrapper” of the following form:

CREATE VIEW _ AS SELECT * FROM MAT_VIEW WITH (NOEXPAND);

4) Using table functions


Often in LINQ queries, large subquery blocks or blocks that use representations with a complex structure form the final query with a very complex and not optimal execution structure.

Key benefits of using table functions in LINQ queries :

  1. The ability, as in the case of views, to use and specify as an object, but you can pass a set of input parameters:
    FROM FUNCTION (@ param1, @ param2 ...)
    in the end, you can achieve flexible data sampling
  2. When using a table function, there are no such strong restrictions as in the case of indexed views described above:

    1. :
      LINQ .
      .
      ,
    2. , , :

      • ( )
      • UNION EXISTS

  3. OPTION , OPTION(MAXDOP N), . :

    • OPTION (RECOMPILE)
    • , OPTION (FORCE ORDER)

    OPTION .
  4. :
    ( ), .
    , , WHERE (a, b, c).

    a = 0 and b = 0.

    , c .

    a = 0 and b = 0 , .

    Here the table function may be a better option.

    Also, the table function is more predictable and constant in execution time.

Examples


Let's consider an example implementation using the example of the Questions database.

There is a SELECT query that combines several tables and uses one view (OperativeQuestions), which verifies by email the affiliation (via EXISTS ) of “Active Queries” ([OperativeQuestions]):

Request No. 1
(@p__linq__0 nvarchar(4000))SELECT
1 AS [C1],
[Extent1].[Id] AS [Id],
[Join2].[Object_Id] AS [Object_Id],
[Join2].[ObjectType_Id] AS [ObjectType_Id],
[Join2].[Name] AS [Name],
[Join2].[ExternalId] AS [ExternalId]
FROM [dbo].[Questions] AS [Extent1]
INNER JOIN (SELECT [Extent2].[Object_Id] AS [Object_Id],
[Extent2].[Question_Id] AS [Question_Id], [Extent3].[ExternalId] AS [ExternalId],
[Extent3].[ObjectType_Id] AS [ObjectType_Id], [Extent4].[Name] AS [Name]
FROM [dbo].[ObjectQuestions] AS [Extent2]
INNER JOIN [dbo].[Objects] AS [Extent3] ON [Extent2].[Object_Id] = [Extent3].[Id]
LEFT OUTER JOIN [dbo].[ObjectTypes] AS [Extent4] 
ON [Extent3].[ObjectType_Id] = [Extent4].[Id] ) AS [Join2] 
ON [Extent1].[Id] = [Join2].[Question_Id]
WHERE ([Extent1].[AnswerId] IS NULL) AND (0 = [Extent1].[Exp]) AND ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[OperativeQuestions] AS [Extent5]
WHERE (([Extent5].[Email] = @p__linq__0) OR (([Extent5].[Email] IS NULL) 
AND (@p__linq__0 IS NULL))) AND ([Extent5].[Id] = [Extent1].[Id])
));


The view has a rather complicated structure: it has subquery joins and the use of DISTINCT sorting , which in the general case is a rather resource-intensive operation.

A selection of about ten thousand records from OperativeQuestions.

The main problem of this query is that for records from an external query, an internal subquery is performed on the [OperativeQuestions] view, which should limit the output sample (via EXISTS ) to hundreds of records for [Email] = @ p__linq__0 .

And it might seem that the subquery should once calculate the records by [Email] = @ p__linq__0, and then these couple of hundreds of records should be connected by Id c Questions, and the query will be fast.

In fact, all tables are connected in series: both Id Questions are checked for compliance with Id from OperativeQuestions, and email is filtered.

In fact, the request works with all tens of thousands of OperativeQuestions records, and you only need data of interest on Email.

OperativeQuestions view text:

Request No. 2
 
CREATE VIEW [dbo].[OperativeQuestions]
AS
SELECT DISTINCT Q.Id, USR.email AS Email
FROM            [dbo].Questions AS Q INNER JOIN
                         [dbo].ProcessUserAccesses AS BPU ON BPU.ProcessId = CQ.Process_Id 
OUTER APPLY
                     (SELECT   1 AS HasNoObjects
                      WHERE   NOT EXISTS
                                    (SELECT   1
                                     FROM     [dbo].ObjectUserAccesses AS BOU
                                     WHERE   BOU.ProcessUserAccessId = BPU.[Id] AND BOU.[To] IS NULL)
) AS BO INNER JOIN
                         [dbo].Users AS USR ON USR.Id = BPU.UserId
WHERE        CQ.[Exp] = 0 AND CQ.AnswerId IS NULL AND BPU.[To] IS NULL 
AND (BO.HasNoObjects = 1 OR
              EXISTS (SELECT   1
                           FROM   [dbo].ObjectUserAccesses AS BOU INNER JOIN
                                      [dbo].ObjectQuestions AS QBO 
                                                  ON QBO.[Object_Id] =BOU.ObjectId
                               WHERE  BOU.ProcessUserAccessId = BPU.Id 
                               AND BOU.[To] IS NULL AND QBO.Question_Id = CQ.Id));


Original mapping representation in DbContext (EF Core 2)
public class QuestionsDbContext : DbContext
{
    //...
    public DbQuery<OperativeQuestion> OperativeQuestions { get; set; }
    //...
    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Query<OperativeQuestion>().ToView("OperativeQuestions");
    }
}


Original LINQ Query
var businessObjectsData = await context
    .OperativeQuestions
    .Where(x => x.Email == Email)
    .Include(x => x.Question)
    .Select(x => x.Question)
    .SelectMany(x => x.ObjectQuestions,
                (x, bo) => new
                {
                    Id = x.Id,
                    ObjectId = bo.Object.Id,
                    ObjectTypeId = bo.Object.ObjectType.Id,
                    ObjectTypeName = bo.Object.ObjectType.Name,
                    ObjectExternalId = bo.Object.ExternalId
                })
    .ToListAsync();


In this particular case, a solution to this problem is considered without infrastructural changes, without introducing a separate table with ready-made results (“Active Queries”), for which a mechanism would be needed to fill in its data and keep it up to date.

Although this is a good solution, there is another option for optimizing this task.

The main goal is to cache entries by [Email] = @ p__linq__0 from the OperativeQuestions view.

We enter the table function [dbo]. [OperativeQuestionsUserMail] in the database.

Sending Email as the input parameter, we get back the table of values:

Request No. 3

CREATE FUNCTION [dbo].[OperativeQuestionsUserMail]
(
    @Email  nvarchar(4000)
)
RETURNS
@tbl TABLE
(
    [Id]           uniqueidentifier,
    [Email]      nvarchar(4000)
)
AS
BEGIN
        INSERT INTO @tbl ([Id], [Email])
        SELECT Id, @Email
        FROM [OperativeQuestions]  AS [x] WHERE [x].[Email] = @Email;
     
    RETURN;
END


This returns a table of values ​​with a predefined data structure.

In order for the queries to OperativeQuestionsUserMail to be optimal, to have optimal query plans, a strict structure is required, and not RETURNS TABLE AS RETURN ...

In this case, the desired Request 1 is converted to Request 4:

Request No. 4
(@p__linq__0 nvarchar(4000))SELECT
1 AS [C1],
[Extent1].[Id] AS [Id],
[Join2].[Object_Id] AS [Object_Id],
[Join2].[ObjectType_Id] AS [ObjectType_Id],
[Join2].[Name] AS [Name],
[Join2].[ExternalId] AS [ExternalId]
FROM (
    SELECT Id, Email FROM [dbo].[OperativeQuestionsUserMail] (@p__linq__0)
) AS [Extent0]
INNER JOIN [dbo].[Questions] AS [Extent1] ON([Extent0].Id=[Extent1].Id)
INNER JOIN (SELECT [Extent2].[Object_Id] AS [Object_Id], [Extent2].[Question_Id] AS [Question_Id], [Extent3].[ExternalId] AS [ExternalId], [Extent3].[ObjectType_Id] AS [ObjectType_Id], [Extent4].[Name] AS [Name]
FROM [dbo].[ObjectQuestions] AS [Extent2]
INNER JOIN [dbo].[Objects] AS [Extent3] ON [Extent2].[Object_Id] = [Extent3].[Id]
LEFT OUTER JOIN [dbo].[ObjectTypes] AS [Extent4] 
ON [Extent3].[ObjectType_Id] = [Extent4].[Id] ) AS [Join2] 
ON [Extent1].[Id] = [Join2].[Question_Id]
WHERE ([Extent1].[AnswerId] IS NULL) AND (0 = [Extent1].[Exp]);


Mapping views and functions in DbContext (EF Core 2)
public class QuestionsDbContext : DbContext
{
    //...
    public DbQuery<OperativeQuestion> OperativeQuestions { get; set; }
    //...
    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Query<OperativeQuestion>().ToView("OperativeQuestions");
    }
}
 
public static class FromSqlQueries
{
    public static IQueryable<OperativeQuestion> GetByUserEmail(this DbQuery<OperativeQuestion> source, string Email)
        => source.FromSql($"SELECT Id, Email FROM [dbo].[OperativeQuestionsUserMail] ({Email})");
}


Final LINQ query
var businessObjectsData = await context
    .OperativeQuestions
    .GetByUserEmail(Email)
    .Include(x => x.Question)
    .Select(x => x.Question)
    .SelectMany(x => x.ObjectQuestions,
                (x, bo) => new
                {
                    Id = x.Id,
                    ObjectId = bo.Object.Id,
                    ObjectTypeId = bo.Object.ObjectType.Id,
                    ObjectTypeName = bo.Object.ObjectType.Name,
                    ObjectExternalId = bo.Object.ExternalId
                })
    .ToListAsync();


The order of the execution time decreased from 200-800 ms, to 2-20 ms., Etc., that is, ten times faster.

If we take more averaged, then instead of 350 ms we got 8 ms.

From the obvious pluses, we also get:

  1. general reduction in reading load,
  2. significant reduction in the likelihood of blocking
  3. reduction of average blocking time to acceptable values

Conclusion


Optimization and fine-tuning of calls to the MS SQL database through LINQ is a problem that can be solved.

In this work, care and consistency are very important.

At the beginning of the process:

  1. it is necessary to check the data with which the query works (values, selected data types)
  2. properly index this data
  3. check the correctness of the connecting conditions between the tables

At the next iteration, the optimization reveals:

  1. basis of the request and the main filter of the request
  2. repeating similar query blocks and intersecting conditions
  3. in the SSMS or other GUI for SQL Server , the SQL query itself is optimized (allocation of an intermediate data store, building the resulting query using this store (there may be several))
  4. at the last stage, taking the resulting SQL query as a basis , the LINQ query structure is rebuilt

As a result, the resulting LINQ query should become identical in structure to the identified optimal SQL query from paragraph 3.

Acknowledgments


Many thanks to colleagues jobgemws and alex_ozrfrom Fortis for helping with this article.

All Articles