Microsoft introduced in-memory OLTP in SQL Server 2014, advertising that queries can be up to 100 times faster. In this blog series I will explain the design principles of the in-memory OLTP engine to explain why it has the potential to be significantly faster than the traditional on-disk engine.

When a request is sent to the query engine, the traditional storage engine must first read the data pages from disk if they are not yet available in the buffer pool. Reading from the hard disk is still one of the most time-consuming operations and therefore largely determines the execution of the queries. Reading data from memory is much faster than reading from disk but memory was expensive in the past, therefore SQL Server loads data in the buffer pool on demand only. Memory is much cheaper nowadays, so that the core idea of the on-disk engine to regulate the memory allocation is no longer valid. In-memory OLTP does not save any data pages on disk but only reads and saves records in memory. Therefore, it is possible to avoid the complexity of handling the data pages in the buffer pool and skip physical reads which can result in much faster query execution.

The following examples compare the on-disk and the in-memory OLTP engine with regard to IO reads and memory allocation.

For the following analysis, we nee to create a clustered table with 5,000,000 records:


USE AdventureWorks2017;
GO

DROP TABLE IF EXISTS Sales.SalesOrderDetailMedium;
GO

-- Create a table with 5,000,000 records
SELECT TOP 5000000
	*
INTO Sales.SalesOrderDetailMedium
FROM
	Sales.SalesOrderDetailBig;
GO

-- Add a clustered key
-- Later we will see that an in-memory table always needs an index.
-- Since we are comparing the size of in-memory to an on-disk table, it is
-- appropriate to create a similar index.
CREATE CLUSTERED INDEX CL_SalesOrderDetailMedium_SalesOrderID_SalesOrderDetailID
ON Sales.SalesOrderDetailMedium (
									SalesOrderID
								  , SalesOrderDetailID
								);
GO

To ensure that the example is deterministic, we flush the buffer cache and enable statistics IO and time to measure query execution.


-- Flush the buffer cache 
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
GO

SET STATISTICS IO, TIME ON;
GO

Now let’s count the number of records in the table.


-- Costs :25.8056
-- Index Scan (Estimated I/O Costs) : 23.652
-- Scan count 9, logical reads 32437, read-ahead reads 32082
-- CPU time = 2016 ms,  elapsed time = 504 ms.
SELECT COUNT(*)
FROM
	Sales.SalesOrderDetailMedium;
GO

record_count_on_disk_table

The query is executed in parallel as the costs are higher than the Cost Threshold for parallelism on my SQL Server instance. 97% of the total query cost is caused by the clustered index scan because the storage engine reads 32,437 data pages from disk. The query execution time is 504 ms, but SQL Server has been working on the CPU for more than two seconds because of the parallel plan.

It is important to understand that all pages SQL Server read are saved in the buffer pool and allocates memory.


-- Identify the objects in the buffer cache 
-- SalesOrderDetailMedium : 250MB
SELECT
	DB_NAME(DB_ID()) AS db_name
  , OBJECT_SCHEMA_NAME(obj.object_id, DB_ID()) AS schema_name
  , OBJECT_NAME(obj.object_id) AS object_name
  , page_type
  , COUNT(*) AS cached_pages
  , (COUNT(*) * 8.0) / 1024 AS cached_pages_in_mb
  , SUM(row_count) AS row_count
  , SUM((free_space_in_bytes / 1024)) AS free_space_in_mb
  , SUM(read_microsec) AS read_microsec
FROM
	sys.dm_os_buffer_descriptors AS bd
INNER JOIN
	(
		SELECT
			object_id, index_id, allocation_unit_id
		FROM
			sys.allocation_units AS au
		INNER JOIN
			sys.partitions AS p
		ON
			au.container_id = p.hobt_id
			AND
				(
					au.type = 1
					OR au.type = 3
				)
		UNION ALL
		SELECT
			object_id, index_id, allocation_unit_id
		FROM
			sys.allocation_units AS au
		INNER JOIN
			sys.partitions AS p
		ON
			au.container_id = p.partition_id
			AND au.type = 2
	) AS obj
ON
	bd.allocation_unit_id = obj.allocation_unit_id
WHERE
	bd.database_id = DB_ID()
	AND OBJECT_NAME(obj.object_id) = 'SalesOrderDetailMedium'
GROUP BY
	OBJECT_SCHEMA_NAME(obj.object_id, DB_ID())
  , OBJECT_NAME(obj.object_id)
  , page_type
ORDER BY (COUNT(*) * 8.0) / 1024 DESC;
GO

buffer_allocation_record_count_on_disk_table
If we execute the same query again, we will see that the query plan and the costs are the same, but this time the storage engine does not have to read the pages from disk, but from the buffer pool instead. The estimated IO costs of 23.652 for the clustered index scan is based on the assumption that the data pages must be read from disk with every new execution of this query. This does not apply as long as the data pages are still available in the buffer pool. The query engine does not distinguish between physical and logical reads because it simply does not know whether the pages that the query needs to read are in the buffer pool or on disk. As long as the lazy writer has not flushed the pages from the buffer all subsequent queries can avoid physical reads which is much cheaper than physical reads – but as you see, the estimated IO costs are the same for each execution of this query.

Let us execute the same query on an in-memory table to compare the execution plan, buffer allocation and statistics.
First, we create a database which supports in-memory objects.


USE master;
GO

IF DB_ID('MemoryOptimized') IS NOT NULL
	BEGIN
		ALTER DATABASE MemoryOptimized SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
		DROP DATABASE MemoryOptimized;
	END;
GO

:SETVAR DataPath "E:\Microsoft SQL Server\MSSQL14.MYSQLSERVER2017A\MSSQL\DATA\"
:SETVAR LogPath "F:\Microsoft SQL Server\MSSQL14.MYSQLSERVER2017A\MSSQL\LOG\"
CREATE DATABASE MemoryOptimized
ON PRIMARY
	   -- disk based objects
	   (
		   NAME = MemoryOptimized_data
		 , FILENAME = '$(DataPath)MemoryOptimized_data.mdf'
		 , SIZE = 5GB
	   )
-- in-memory objects
-- one filegroup with MEMORY_OPTIMIZED_DATA is needed
, FILEGROUP MemoryOptimized CONTAINS MEMORY_OPTIMIZED_DATA
	  (
		  NAME = MemoryOptimized_file
		, FILENAME = '$(DataPath)MemoryOptimized_file'
		, MAXSIZE = 5GB
	  )
-- disk based objects
LOG ON
	(
		NAME = MemoryOptimized_log
	  , FILENAME = '$(LogPath)MemoryOptimized_log.ldf'
	  , SIZE = 5GB
	);
GO

After the database is available, we create the in-memory table with a hash index of 5,000,000 buckets. We have to use the tempdb to transfer the data from the AdventureWorks database because in-memory OLTP does not allow cross-user database queries.


USE MemoryOptimized;
GO

CREATE SCHEMA Sales AUTHORIZATION dbo;
GO

-- Create an in-memory table with a hash index of 5,000,000 buckets
DROP TABLE IF EXISTS Sales.SalesOrderDetailMedium;
GO

CREATE TABLE Sales.SalesOrderDetailMedium
(
	SalesOrderID int NOT NULL
  , SalesOrderDetailID int NOT NULL
  , CarrierTrackingNumber nvarchar(25) NULL
  , OrderQty smallint NOT NULL
  , ProductID int NOT NULL
  , UnitPrice money NOT NULL
  , INDEX IX HASH (SalesOrderID, SalesOrderDetailID) WITH (BUCKET_COUNT = 5000000)
)
WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY);
GO

DROP TABLE IF EXISTS tempdb.dbo.#SalesOrderDetailMedium;
GO

SELECT *
INTO #SalesOrderDetailMedium
FROM
	AdventureWorks2017.Sales.SalesOrderDetailMedium;
GO

INSERT Sales.SalesOrderDetailMedium
	(
		SalesOrderID
	  , SalesOrderDetailID
	  , CarrierTrackingNumber
	  , OrderQty
	  , ProductID
	  , UnitPrice
	)
SELECT
	SalesOrderID
  , SalesOrderDetailID
  , CarrierTrackingNumber
  , OrderQty
  , ProductID
  , UnitPrice
FROM
	#SalesOrderDetailMedium;
GO

Now it is time to run the same query we run on the on-disk table. In this example, we do not use natively compiled code, but interpreted code that accesses the in-memory data via the query interop.


USE MemoryOptimized;
GO

-- Interpreted code
-- Costs : 6.40351 (25.8056)
-- Index Scan (Estimated I/O Costs) : 0 (23.652)
-- CPU time = 2000 (2016) ms,  elapsed time = 379 (504) ms.
SELECT COUNT(*)
FROM
	Sales.SalesOrderDetailMedium;
GO

record_count_in_memory_table

The query engine estimates the total query costs at 6.40351, which is a significant improvement over the query using the on-disk table. The interop query is cheaper because the SQL Server does not have to read data pages from disk or the buffer pool but instead directly from memory. Because the query reads records from memory, IO statistics are not available. The interop query is slightly faster compared to the on-disk query, but both queries spend almost the same amount of time on the CPU.

Let us analyze the buffer pool again.


-- No pages are saved in buffer pool 
SELECT
	bd.is_modified
  , DB_NAME(DB_ID()) AS db_name
  , OBJECT_SCHEMA_NAME(object_id, DB_ID()) AS schema_name
  , OBJECT_NAME(object_id) AS object_name
  , file_id
  , page_id
  , page_level
  , page_type
  , row_count
  , (free_space_in_bytes)
  , (read_microsec) AS read_microsec
FROM
	sys.dm_os_buffer_descriptors AS bd
INNER JOIN
	(
		SELECT
			object_id, index_id, allocation_unit_id
		FROM
			sys.allocation_units AS au
		INNER JOIN
			sys.partitions AS p
		ON
			au.container_id = p.hobt_id
			AND
				(
					au.type = 1
					OR au.type = 3
				)
		UNION ALL
		SELECT
			object_id, index_id, allocation_unit_id
		FROM
			sys.allocation_units AS au
		INNER JOIN
			sys.partitions AS p
		ON
			au.container_id = p.partition_id
			AND au.type = 2
	) AS obj
ON
	bd.allocation_unit_id = obj.allocation_unit_id
WHERE
	database_id = DB_ID()
	AND obj.object_id = OBJECT_ID('Sales.SalesOrderDetailMedium', 'U');
GO

buffer_allocation_record_count_in_memory_table
There are no data pages in the buffer, since in-memory objects do not use data pages, but use records in an index chain to store the data directly in memory. To find the memory allocation of the in-memory data, we need to use another dynamic management view.


-- Allocated_bytes:
-- 8388608 bucktes x 8 bytes : 64MB
-- Hash index : 67108864 : 64MB  
-- Table heap : 400687104 : 382.125MB 
-- row-store : 250MB | in-memory : 446MB
SELECT
	obj.name AS object_name
  , ind.index_id
  , ind.type_desc
  , mec.memory_consumer_desc
  , mec.allocated_bytes
  , mec.used_bytes
  , mec.allocation_count
FROM
	sys.dm_db_xtp_memory_consumers AS mec;
INNER JOIN
	sys.objects AS obj
ON
	mec.object_id = obj.object_id
LEFT JOIN
	sys.indexes AS ind
ON
	mec.index_id = ind.index_id
	AND ind.object_id = mec.object_id
WHERE
	obj.object_id = OBJECT_ID('Sales.SalesOrderDetailMedium', 'U');
GO

memory_allocation_in_memory_table
As we can see, the in-memory data requires 78% more memory than the on-disk data.
One reason is that the hash index always needs 8 bytes per bucket. Therefore, the hash index with 5,000,000 buckets needs 67,108,864 bytes or 64 MB – we can see the same size in the result of the dynamic management view sys.dm_db_xtp_memory_consumers sys.dm_db_xtp_memory_consumers.


-- A hash index with a definiton of 5,000,000 buckets actually creates 
-- 8,388,608 buckets 
SELECT POWER(2, CEILING(LOG(5000000) / LOG(2))) AS bucket_count;
GO

-- Every bucket needs 8 bytes
-- 67,108,864 bytes
SELECT POWER(2, CEILING(LOG(5000000) / LOG(2))) * 8 AS bucket_size;
GO

If we run the same query with natively compiled code, we will see a big difference in the execution time of the query.


-- Create a natively compiled stored procedure
DROP PROCEDURE IF EXISTS test;
GO

CREATE PROCEDURE test
WITH SCHEMABINDING, NATIVE_COMPILATION
AS
	BEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
		SELECT COUNT(*)
		FROM
			Sales.SalesOrderDetailMedium;
	END;
GO

-- SQL Server Execution Times:
-- CPU time = 297 (2000) (2016) ms,  elapsed time = 296 (379) (504) ms.
EXECUTE test;
GO

Unfortunately, the query engine does not display the execution plan of natively compiled code. Therefore we can only compare the time statistics.
The execution time is a little faster than before but still a reduction of 20% respectively 40% and the CPU time has dropped to 297 ms.
As you can see, the natively compiled code in this example is much more efficient in reading and counting the in-memory data.

Let’s clean up a little.


-- Housekeeping
USE AdventureWorks2017;
DROP TABLE IF EXISTS Sales.SalesOrderDetailMedium;
USE master;
ALTER DATABASE MemoryOptimized SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE MemoryOptimized;
GO