Microsoft introduced in-memory OLTP in SQL Server 2014, advertising that queries can be up to 100 times faster. In this blog series I will explain the design principles of the in-memory OLTP engine to explain why it has the potential to be significantly faster than the traditional on-disk engine.
Before executing a T-SQL command the first time, SQL Server must always create a query plan. The evaluation of different permutations to find a good execution plan costs memory, CPU cycles and compile time. Let us create and execute the following stored procedure to better understand this.

USE AdventureWorks2017;
GO

SET STATISTICS TIME ON;
GO

DROP PROCEDURE IF EXISTS interpreted;
GO

CREATE PROCEDURE interpreted
AS
	BEGIN
		SELECT *
		FROM
			Production.Product AS pr
		LEFT JOIN
			Production.ProductSubcategory AS su
		ON
			pr.ProductSubcategoryID = su.ProductSubcategoryID
		LEFT JOIN
			Sales.SalesOrderDetail AS de
		ON
			de.ProductID = pr.ProductID
		LEFT JOIN
			Production.ProductSubcategory
		ON
			pr.ProductSubcategoryID = pr.ProductSubcategoryID
		INNER JOIN
			Sales.SpecialOffer AS soff
		ON
			de.SpecialOfferID = soff.SpecialOfferID
		LEFT JOIN
			Sales.SalesOrderHeader AS he
		ON
			he.SalesOrderID = de.SalesOrderID
		LEFT JOIN
			Sales.Customer AS cu
		ON
			he.CustomerID = cu.CustomerID
		LEFT JOIN
			Sales.SalesPerson AS per
		ON
			per.BusinessEntityID = he.SalesPersonID
		LEFT OUTER JOIN
			Sales.SalesTerritory AS ter
		ON
			he.TerritoryID = ter.TerritoryID
		LEFT JOIN
			Person.Address AS ag
		ON
			he.ShipToAddressID = ag.AddressID
		WHERE
			pr.ProductID = 712;
	END;
GO
SET STATISTICS TIME ON; 
GO 

-- Clear the plan cache for the current database 
DECLARE @db_id int = DB_ID(); 
DBCC FLUSHPROCINDB(@db_id); 
GO 

-- SQL Server parse and compile time: 
-- CPU time = 172 ms, elapsed time = 173 ms. 
EXECUTE interpreted; 
GO
SQL Server parse and compile time: 
   CPU time = 172 ms, elapsed time = 173 ms.

properties-of-execution-plan
The execution plan and the result of SET STATISTICS TIME ON reveals that the query engine spend 172 ms on the CPU and approx. 6 MB to compile the query before the query actually can be executed. To prevent a user from having to wait for the best possible plan before the storage engine presents the data, SQL Server ensures that a balance is found between query compile time and the expected query execution time. For this reason, SQL Server always chooses the best plan in a defined time and does not search for the best plan for an infinitely long time.

After the first execution the query plan is cached and SQL Server can retrieve the plan from cache for all subsequent queries. This approach saves recompiling the plan, and saving CPU, memory and time.
When we run the same query again, the cached execution plan is retrieved from the plan cache and reused, and no CPU time or memory is wasted to recreate the same plan. SQL Server reports a zero compile time for the second execution of the same query.

-- SQL Server parse and compile time: 
-- CPU time = 0 ms, elapsed time = 0 ms.
EXECUTE interpreted;
GO
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

If we analyze the plan cache, we can see that the usecounts is one and the plan generation number is two, indicating that the same plan is being used again.

-- Complied Plan is cached and reused
-- usecount : 2
-- plan_generation_num : 1
SELECT
	ecp.usecounts
  , st.plan_generation_num
  , ecp.cacheobjtype
  , ecp.objtype
  , est.dbid
  , DB_NAME(est.dbid) AS db_name
  , est.text
  , st.query_plan_hash
  , qp.query_plan
  , size_in_bytes
FROM
	sys.dm_exec_cached_plans AS ecp
CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS est
INNER JOIN
	sys.dm_exec_query_stats AS st
ON
	ecp.plan_handle = st.plan_handle
CROSS APPLY sys.dm_exec_query_plan(ecp.plan_handle) AS qp
WHERE
	est.dbid = DB_ID()
	AND query_hash = 0x2BBFB6EC2B89F5C5;
GO

query_usecounts_and_plan_generation_num
In general, it seems beneficial to keep using the same plan over and over, so as not to wait for the plan to be recompiled and to waste system resources.
As long as the metadata of the underlying objects or the statistics do not change, there is actually no need to recompile the plan. But what happens if we delete a supporting index or the statistics of a table that is part of the query are updated? It is likely that the cached execution plan may no longer work, or the plan may use iterators that cannot optimally process the new workload. To reflect the changes, the cached plan should be replaced with a new plan and it is worth waiting for a new plan to be created. The more queries the query engine has to recompile, the more CPU and memory are used and the less hardware resources are available for other processes. You can identify the memory pressure caused by compiling query plans by monitoring the RESOURCE_SEMAPHORE QUERY COMPILE wait type. If you see a long wait, users will have to wait longer for the query to run.
The following example forces SQL Server to recompile a query plan due to a statistics update.

USE AdventureWorks2017;
GO

-- Create a table
DROP TABLE IF EXISTS test;
GO

CREATE TABLE test
(
	col1 int NOT NULL
  , col2 int NOT NULL
);
GO

-- Insert 10,000 records
INSERT test
	(
		col1
	  , col2
	)
SELECT TOP 10000
	message_id, message_id % 2
FROM
	sys.messages AS mes
WHERE
	language_id = 1033
ORDER BY
	message_id;
GO

-- Create a non unique index
CREATE INDEX NCL_test_col1 ON dbo.test (col1);
GO

DROP PROCEDURE IF EXISTS interpreted;
GO

CREATE PROCEDURE interpreted
(@col1 int)
AS
	BEGIN
		SELECT *
		FROM
			dbo.test
		WHERE
			col1 = @col1;
	END;
GO

-- Get the distribution
-- Unique values 
SELECT
	COUNT(col1) AS count_of_col1
  , COUNT(DISTINCT col1) AS count_distinct_of_col1
FROM
	test;
GO

The table has 10,000 unique records.
even-distribution-of-values-in-a-table
When we run the stored procedure for the first time, we see that SQL Server uses an RID lookup in the execution plan because of the selective value and non covering index.

-- Clear the plan cache for the current database
DECLARE @db_id int = DB_ID();
DBCC FLUSHPROCINDB(@db_id);
GO

-- RID Lookup
-- SQL Server parse and compile time: 
-- CPU time = 0 ms, elapsed time = 1 ms.
EXECUTE interpreted @col1 = 101;
GO

rid-lookup
Due to the simplicity of the query, the parse and compile time is negligible this time. However, this example shows the consequences of updating statistics much easier.
If we insert 500 + 20% new values ​​with a number of existing data records of less than 25,000 rows, we force a statistics update and a plan recompilation. The existing plan with the bookmark lookup is efficient only for a selective value, but not for a non-selective value. Inserting a value of 101 100,000 times will force the engine to replace the RID loop with a table scan before the next execution of the same stored procedure.

-- Insert 100,000 records again
-- This time unique 
INSERT test
  (
    col1
    , col2
  )
SELECT TOP 100000
  101, 101
FROM
  sys.messages
WHERE
  language_id = 1033
ORDER BY
  message_id;
GO
-- Recompilation
-- Table Scan
EXECUTE interpreted @col1 = 101;
GO

We can see that SQL Server created a new plan due to the stale and updated statistics.
bookmark-lookup-to-table-scan
The statistics shows the new number of rows due to the automatic update.

-- Show the updated statistics
DBCC SHOW_STATISTICS(test, 'NCL_test_col1') WITH STAT_HEADER;
GO

updated-statistics-due-to-inserts
Analyzing the plan cache shows that the value of plan_generation_num has been increased by 1 which proofs that a new plan was is created.

-- Usecount : 2
-- plan_generation_num : 2
SELECT
  ecp.usecounts
  , eqs.plan_generation_num
  , ecp.cacheobjtype
  , ecp.objtype
  , est.text
  , sql_handle
  , ecp.plan_handle
  , est.dbid
  , DB_NAME(est.dbid) AS db_name
  , eqs.query_plan_hash
  , qp.query_plan
  , size_in_bytes
FROM
  sys.dm_exec_cached_plans AS ecp
CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS est
INNER JOIN
  sys.dm_exec_query_stats AS eqs
ON
  ecp.plan_handle = eqs.plan_handle
CROSS APPLY sys.dm_exec_query_plan(ecp.plan_handle) AS qp
WHERE
  est.dbid = DB_ID()
  AND query_hash = 0x0F4BD1BED921F4AA;
GO

Now that you know that there are good reasons to recompile a query execution plan when the statistics of an underlying table are out of date to avoid using cached query execution plans, which results in query performance issues. You now understand that recompilation requires CPU and memory, and SQL Server implements thresholds and other mechanisms to avoid repeating compilations when they are not needed.
To see the impact of compiling query plans on memory in a busy system, we can run the first query on 100 threads in parallel. We can simply add the RECOMPILE query option to avoid the storage engine reusing the cached plan from the first run and use ostress to execute the workload. The query option RECOMPILE is just way to simulate a recompilation caused by statistics updates, schema changes or unsafe adhoc queries.

-- Create the stored procedure with query option recompile
DROP PROCEDURE IF EXISTS interpreted;
GO

CREATE PROCEDURE interpreted
AS
  BEGIN
    SELECT *
    FROM
      Production.Product AS pr
    LEFT JOIN
      Production.ProductSubcategory AS su
    ON
      pr.ProductSubcategoryID = su.ProductSubcategoryID
    LEFT JOIN
      Sales.SalesOrderDetail AS de
    ON
      de.ProductID = pr.ProductID
    LEFT JOIN
      Production.ProductSubcategory
    ON
      pr.ProductSubcategoryID = pr.ProductSubcategoryID
    INNER JOIN
      Sales.SpecialOffer AS soff
    ON
      de.SpecialOfferID = soff.SpecialOfferID
    LEFT JOIN
      Sales.SalesOrderHeader AS he
    ON
      he.SalesOrderID = de.SalesOrderID
    LEFT JOIN
      Sales.Customer AS cu
    ON
      he.CustomerID = cu.CustomerID
    LEFT JOIN
      Sales.SalesPerson AS per
    ON
      per.BusinessEntityID = he.SalesPersonID
    LEFT OUTER JOIN
      Sales.SalesTerritory AS ter
    ON
      he.TerritoryID = ter.TerritoryID
    LEFT JOIN
      Person.Address AS ag
    ON
      he.ShipToAddressID = ag.AddressID
    WHERE
      pr.ProductID = 712
    OPTION (RECOMPILE);
  END;
GO

If we now execute the interpreted stored procedure 1 time in parallel on 100 threads, we can see a high impact on the memory, which is also evident from the long waiting time for the RESOURCE_SEMAPHORE_QUERY_COMPILE wait type.

-- Clear the wait stats
DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);
GO

-- Run the SQL statement 1 time on 100 threads 
-- "ostress.exe" -SMYPC\MYSQLSERVER2019A -dAdventureWorks2017 -Q"EXECUTE interpreted" -n100 -r1 -q
-- Get the wait statistics
-- Query compilation causes a wait on Resource_Semaphore_Query_Compile
SELECT *
FROM sys.dm_os_wait_stats
WHERE
wait_type = N'Resource_Semaphore_Query_Compile'
GO

wait-type-resource-semaphore-query-compile
We can also see the high CPU allocation and waiting time for compilation in the activity monitor.
activity-monitor-resource-semaphore-query-compile
Keep in mind that each compilation of each query requires 173 ms of CPU and approximately 6 MB of memory to create and optimize the execution plan which can can cause serious performance problems. Incidentally, this is often the case when SQL Server has to repeatedly run unsafe adhoc queries or the clock hand will remove query plans from the cache due to insufficient memory.

The in-memory OLTP engine enables the negative effects of repeated recompilations on the hardware resources to be avoided. When you create a natively stored procedure, SQL Server compiles the code directly, even before you run it for the first time. The result of the compilation is a DLL (C code) that is loaded into the memory area of ​​SQL Server. Once the DLL is created it will not change. In summary, CPU and memory are only needed once to build the DLL when the procedure is created or SQL Server is restarted.
Even if the statistics of the underlying in-memory table change over time, the DLL is not replaced, and therefore all natively compiled stored procedures are optimized for unknown parameter values. Optimizing for unknown parameter values ​​can affect the performance of the interpreted stored procedure, but less so for the natively compiled stored procedures.
Each in-memory table must have at least one index. Unlike a conventional B-tree indexes, the in-memory hash or range indexes always contain all non-index columns. So when you create the execution plan for natively compiled code, it doesn’t matter which index the engine uses, as there is never a situation like bookmark lookup.
However, massive changes to in-memory data can have a negative impact on performance, especially if it is a hash index. Therefore, it is important to monitor the in-memory index statistics to determine if it is worth manually recompiling the natively compiled stored procedures from time to time.

The following code just creates an empty in-memory table and a natively compiled stored procedure.

USE MemoryOptimized;
GO

-- Create a memory optimized table
DROP TABLE IF EXISTS test;
GO

CREATE TABLE test
(
  col1 int NOT NULL
  , col2 int NOT NULL
  , INDEX NCL NONCLUSTERED HASH (col2) WITH (BUCKET_COUNT = 2000)
)
WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY);
GO

-- Insert 10,000 records
INSERT test
  (
    col1
    , col2
  )
SELECT TOP 10000
  message_id, message_id % 2
FROM
  sys.messages
WHERE
  language_id = 1033
ORDER BY
  message_id;
GO

-- Create a natively compiled stored procedure
DROP PROCEDURE IF EXISTS natively_compiled_stored_procedure;
GO

CREATE PROCEDURE natively_compiled_stored_procedure
(@col1 int)
WITH SCHEMABINDING, NATIVE_COMPILATION
AS
  BEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
    SELECT col1 FROM dbo.test WHERE col1 = @col1;
  END;
GO

Even if the stored procedure has not yet been executed, we can see that the code has been compiled and the DLL is loaded into memory.

-- E:\Microsoft SQL Server\MSSQL15.MYSQLSERVER2019A\MSSQL\DATA\xtp\7\xtp_p_7_629577281_188665047046304_1.dll
-- Natively compiled procedure is compiled before execution with 
-- optimized for unknown
-- List of all table and procedure currently loaded in-memory.
-- All DLLs are saved in only one folder which is the database default locations
-- data folder of the instance.
WITH
  c AS
  (
    SELECT
      name
      , description
      , SUBSTRING(
             name, PATINDEX('%xtp_[pt]%', name) + LEN('xtp_x_')
             , LEN(name)
           ) AS _name
    FROM
      sys.dm_os_loaded_modules
    WHERE
      description = 'XTP Native DLL'
  )
SELECT
  name
  , description
  , DB_NAME(LEFT((_name), CHARINDEX('_', (_name)) - 1)) AS database_name
  , CASE SUBSTRING(name, PATINDEX('%xtp_[pt]%', name) + LEN('xtp_'), 1)
    WHEN 't'
       THEN 'Table'
    ELSE 'Procedure'
  END AS object_type
  , OBJECT_NAME(
           LEFT(SUBSTRING(
                   _name, (CHARINDEX('_', (_name))) + 1
                   , LEN(name)
                 ), CHARINDEX(
                         '_'
                         , (SUBSTRING(
                                 _name
                               , (CHARINDEX(
                                       '_'
                                       , (_name)
                                     )
                                 ) + 1
                               , LEN(name)
                               )
                         )
                       ) - 1)
         , LEFT((_name), CHARINDEX('_', (_name)) - 1)
         ) AS object_name
FROM
  c;
GO

xtp-and-dll-loaded-into-memory
The procedure and table DLL is on disk for debugging purposes only, but we can clearly see when the DLL of the procedure was created.
 xtp-and-dll-in-folder
If we now insert some records into the in-memory table that would update the statistics for an on-disk table, we can see that the DLL is not rebuilt and therefore no CPU and memory is allocated.

-- Insert 100000 records again
-- This time unique 
INSERT test
  (
    col1
    , col2
  )
SELECT TOP 100000
  101, 101
FROM
  sys.messages
WHERE
  language_id = 1033
ORDER BY
  message_id;
GO
EXECUTE natively_compiled_stored_procedure @col1 = 101;
GO

 xtp-and-dll-in-folder
For sure there are also ways to reduce the number of recompilations for the interpreted code but you can see, that the in-memory OLTP can removes some of the weaknesses of the on-disk engine.

Reader Rating: 5 (4 Votes)
redgate

inside-sqlserver is a friend of Redgate

Share the passion