Microsoft introduced in-memory OLTP in SQL Server 2014, advertising that queries can be up to 100 times faster. In this blog series I will explain the design principles of the in-memory OLTP engine to explain why it has the potential to be significantly faster than the traditional on-disk engine.
When a request is sent to the query engine, the traditional storage engine must first read the data pages from disk if they are not yet available in the buffer pool. Reading from the hard disk is still one of the most time-consuming operations and therefore largely determines the execution of the queries. Reading data from memory is much faster than reading from disk but memory was expensive in the past, therefore SQL Server loads data in the buffer pool on demand only. Memory is much cheaper nowadays, so that the core idea of the on-disk engine to regulate the memory allocation is no longer valid. In-memory OLTP does not save any data pages on disk but only reads and saves records in memory. Therefore, it is possible to avoid the complexity of handling the data pages in the buffer pool and skip physical reads which can result in much faster query execution.
The following examples compare the on-disk and the in-memory OLTP engine with regard to IO reads and memory allocation.
For the following analysis, we nee to create a clustered table with 5,000,000 records:
USE AdventureWorks2017; GO DROP TABLE IF EXISTS Sales.SalesOrderDetailMedium; GO -- Create a table with 5,000,000 records SELECT TOP 5000000 * INTO Sales.SalesOrderDetailMedium FROM Sales.SalesOrderDetailBig; GO -- Add a clustered key -- Later we will see that an in-memory table always needs an index. -- Since we are comparing the size of in-memory to an on-disk table, it is -- appropriate to create a similar index. CREATE CLUSTERED INDEX CL_SalesOrderDetailMedium_SalesOrderID_SalesOrderDetailID ON Sales.SalesOrderDetailMedium ( SalesOrderID , SalesOrderDetailID ); GO
To ensure that the example is deterministic, we flush the buffer cache and enable statistics IO and time to measure query execution.
-- Flush the buffer cache CHECKPOINT; DBCC DROPCLEANBUFFERS; GO
SET STATISTICS IO, TIME ON; GO
Now let’s count the number of records in the table.
-- Costs :25.8056 -- Index Scan (Estimated I/O Costs) : 23.652 -- Scan count 9, logical reads 32437, read-ahead reads 32082 -- CPU time = 2016 ms, elapsed time = 504 ms. SELECT COUNT(*) FROM Sales.SalesOrderDetailMedium; GO
The query is executed in parallel as the costs are higher than the Cost Threshold for parallelism on my SQL Server instance. 97% of the total query cost is caused by the clustered index scan because the storage engine reads 32,437 data pages from disk. The query execution time is 504 ms, but SQL Server has been working on the CPU for more than two seconds because of the parallel plan.
It is important to understand that all pages SQL Server read are saved in the buffer pool and allocates memory.
-- Identify the objects in the buffer cache -- SalesOrderDetailMedium : 250MB SELECT DB_NAME(DB_ID()) AS db_name , OBJECT_SCHEMA_NAME(obj.object_id, DB_ID()) AS schema_name , OBJECT_NAME(obj.object_id) AS object_name , page_type , COUNT(*) AS cached_pages , (COUNT(*) * 8.0) / 1024 AS cached_pages_in_mb , SUM(row_count) AS row_count , SUM((free_space_in_bytes / 1024)) AS free_space_in_mb , SUM(read_microsec) AS read_microsec FROM sys.dm_os_buffer_descriptors AS bd INNER JOIN ( SELECT object_id, index_id, allocation_unit_id FROM sys.allocation_units AS au INNER JOIN sys.partitions AS p ON au.container_id = p.hobt_id AND ( au.type = 1 OR au.type = 3 ) UNION ALL SELECT object_id, index_id, allocation_unit_id FROM sys.allocation_units AS au INNER JOIN sys.partitions AS p ON au.container_id = p.partition_id AND au.type = 2 ) AS obj ON bd.allocation_unit_id = obj.allocation_unit_id WHERE bd.database_id = DB_ID() AND OBJECT_NAME(obj.object_id) = 'SalesOrderDetailMedium' GROUP BY OBJECT_SCHEMA_NAME(obj.object_id, DB_ID()) , OBJECT_NAME(obj.object_id) , page_type ORDER BY (COUNT(*) * 8.0) / 1024 DESC; GO
If we execute the same query again, we will see that the query plan and the costs are the same, but this time the storage engine does not have to read the pages from disk, but from the buffer pool instead. The estimated IO costs of 23.652 for the clustered index scan is based on the assumption that the data pages must be read from disk with every new execution of this query. This does not apply as long as the data pages are still available in the buffer pool. The query engine does not distinguish between physical and logical reads because it simply does not know whether the pages that the query needs to read are in the buffer pool or on disk. As long as the lazy writer has not flushed the pages from the buffer all subsequent queries can avoid physical reads which is much cheaper than physical reads – but as you see, the estimated IO costs are the same for each execution of this query.
Let us execute the same query on an in-memory table to compare the execution plan, buffer allocation and statistics.
First, we create a database which supports in-memory objects.
USE master; GO IF DB_ID('MemoryOptimized') IS NOT NULL BEGIN ALTER DATABASE MemoryOptimized SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE MemoryOptimized; END; GO :SETVAR DataPath "E:\Microsoft SQL Server\MSSQL14.MYSQLSERVER2017A\MSSQL\DATA\" :SETVAR LogPath "F:\Microsoft SQL Server\MSSQL14.MYSQLSERVER2017A\MSSQL\LOG\" CREATE DATABASE MemoryOptimized ON PRIMARY -- disk based objects ( NAME = MemoryOptimized_data , FILENAME = '$(DataPath)MemoryOptimized_data.mdf' , SIZE = 5GB ) -- in-memory objects -- one filegroup with MEMORY_OPTIMIZED_DATA is needed , FILEGROUP MemoryOptimized CONTAINS MEMORY_OPTIMIZED_DATA ( NAME = MemoryOptimized_file , FILENAME = '$(DataPath)MemoryOptimized_file' , MAXSIZE = 5GB ) -- disk based objects LOG ON ( NAME = MemoryOptimized_log , FILENAME = '$(LogPath)MemoryOptimized_log.ldf' , SIZE = 5GB ); GO
After the database is available, we create the in-memory table with a hash index of 5,000,000 buckets. We have to use the tempdb to transfer the data from the AdventureWorks database because in-memory OLTP does not allow cross-user database queries.
USE MemoryOptimized; GO CREATE SCHEMA Sales AUTHORIZATION dbo; GO -- Create an in-memory table with a hash index of 5,000,000 buckets DROP TABLE IF EXISTS Sales.SalesOrderDetailMedium; GO CREATE TABLE Sales.SalesOrderDetailMedium ( SalesOrderID int NOT NULL , SalesOrderDetailID int NOT NULL , CarrierTrackingNumber nvarchar(25) NULL , OrderQty smallint NOT NULL , ProductID int NOT NULL , UnitPrice money NOT NULL , INDEX IX HASH (SalesOrderID, SalesOrderDetailID) WITH (BUCKET_COUNT = 5000000) ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY); GO DROP TABLE IF EXISTS tempdb.dbo.#SalesOrderDetailMedium; GO SELECT * INTO #SalesOrderDetailMedium FROM AdventureWorks2017.Sales.SalesOrderDetailMedium; GO INSERT Sales.SalesOrderDetailMedium ( SalesOrderID , SalesOrderDetailID , CarrierTrackingNumber , OrderQty , ProductID , UnitPrice ) SELECT SalesOrderID , SalesOrderDetailID , CarrierTrackingNumber , OrderQty , ProductID , UnitPrice FROM #SalesOrderDetailMedium; GO
Now it is time to run the same query we run on the on-disk table. In this example, we do not use natively compiled code, but interpreted code that accesses the in-memory data via the query interop.
USE MemoryOptimized; GO -- Interpreted code -- Costs : 6.40351 (25.8056) -- Index Scan (Estimated I/O Costs) : 0 (23.652) -- CPU time = 2000 (2016) ms, elapsed time = 379 (504) ms. SELECT COUNT(*) FROM Sales.SalesOrderDetailMedium; GO
The query engine estimates the total query costs at 6.40351, which is a significant improvement over the query using the on-disk table. The interop query is cheaper because the SQL Server does not have to read data pages from disk or the buffer pool but instead directly from memory. Because the query reads records from memory, IO statistics are not available. The interop query is slightly faster compared to the on-disk query, but both queries spend almost the same amount of time on the CPU.
Let us analyze the buffer pool again.
-- No pages are saved in buffer pool SELECT bd.is_modified , DB_NAME(DB_ID()) AS db_name , OBJECT_SCHEMA_NAME(object_id, DB_ID()) AS schema_name , OBJECT_NAME(object_id) AS object_name , file_id , page_id , page_level , page_type , row_count , (free_space_in_bytes) , (read_microsec) AS read_microsec FROM sys.dm_os_buffer_descriptors AS bd INNER JOIN ( SELECT object_id, index_id, allocation_unit_id FROM sys.allocation_units AS au INNER JOIN sys.partitions AS p ON au.container_id = p.hobt_id AND ( au.type = 1 OR au.type = 3 ) UNION ALL SELECT object_id, index_id, allocation_unit_id FROM sys.allocation_units AS au INNER JOIN sys.partitions AS p ON au.container_id = p.partition_id AND au.type = 2 ) AS obj ON bd.allocation_unit_id = obj.allocation_unit_id WHERE database_id = DB_ID() AND obj.object_id = OBJECT_ID('Sales.SalesOrderDetailMedium', 'U'); GO
There are no data pages in the buffer, since in-memory objects do not use data pages, but use records in an index chain to store the data directly in memory. To find the memory allocation of the in-memory data, we need to use another dynamic management view.
-- Allocated_bytes: -- 8388608 bucktes x 8 bytes : 64MB -- Hash index : 67108864 : 64MB -- Table heap : 400687104 : 382.125MB -- row-store : 250MB | in-memory : 446MB SELECT obj.name AS object_name , ind.index_id , ind.type_desc , mec.memory_consumer_desc , mec.allocated_bytes , mec.used_bytes , mec.allocation_count FROM sys.dm_db_xtp_memory_consumers AS mec; INNER JOIN sys.objects AS obj ON mec.object_id = obj.object_id LEFT JOIN sys.indexes AS ind ON mec.index_id = ind.index_id AND ind.object_id = mec.object_id WHERE obj.object_id = OBJECT_ID('Sales.SalesOrderDetailMedium', 'U'); GO
As we can see, the in-memory data requires 78% more memory than the on-disk data.
One reason is that the hash index always needs 8 bytes per bucket. Therefore, the hash index with 5,000,000 buckets needs 67,108,864 bytes or 64 MB – we can see the same size in the result of the dynamic management view sys.dm_db_xtp_memory_consumers sys.dm_db_xtp_memory_consumers.
-- A hash index with a definiton of 5,000,000 buckets actually creates -- 8,388,608 buckets SELECT POWER(2, CEILING(LOG(5000000) / LOG(2))) AS bucket_count; GO -- Every bucket needs 8 bytes -- 67,108,864 bytes SELECT POWER(2, CEILING(LOG(5000000) / LOG(2))) * 8 AS bucket_size; GO
If we run the same query with natively compiled code, we will see a big difference in the execution time of the query.
-- Create a natively compiled stored procedure DROP PROCEDURE IF EXISTS test; GO CREATE PROCEDURE test WITH SCHEMABINDING, NATIVE_COMPILATION AS BEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english') SELECT COUNT(*) FROM Sales.SalesOrderDetailMedium; END; GO -- SQL Server Execution Times: -- CPU time = 297 (2000) (2016) ms, elapsed time = 296 (379) (504) ms. EXECUTE test; GO
Unfortunately, the query engine does not display the execution plan of natively compiled code. Therefore we can only compare the time statistics.
The execution time is a little faster than before but still a reduction of 20% respectively 40% and the CPU time has dropped to 297 ms.
As you can see, the natively compiled code in this example is much more efficient in reading and counting the in-memory data.
Let’s clean up a little.
-- Housekeeping USE AdventureWorks2017; DROP TABLE IF EXISTS Sales.SalesOrderDetailMedium; USE master; ALTER DATABASE MemoryOptimized SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE MemoryOptimized; GO
I am a Microsoft Most Valuable Professional for Data Platform from Germany specialized in Microsoft SQL Server performance tuning, troubleshooting and T-SQL programming. You can meet me at international and national user groups and conferences.
- The limitations of the missing index feature – It does not suggest filtered indexes
- The limitations of the missing index feature – It suggests creating indexes on imprecise and not persisted columns
- The limitations of the missing index feature – It reports only include columns for some queries
- The limitations of the missing index feature – It does not specify an order for columns to be used in an index
- The limitations of the missing index feature – It does not consider trivial plans
inside-sqlserver is a friend of Redgate
Share the passion