In the last post you saw that the missing index feature does not consider the order for columns to be used in an index.
In this blog I will explain another limitation of the missing index feature – it reports only include columns for some queries.
SQL Server supports two kind of row store indexes – the clustered index and then non-clustered index. It is important to understand the difference between these two indexes in order to implement the right index type to improve reading performance.
A row store index creates a B-tree based on the index key columns, where the depth-level of the B-tree depends on the number of data respectively index pages needed to store all of the data. If SQL Server looks for a specific value in an index key column it will read the root page and from there it is directed to the appropriate page of next level in the index tree. Finally, SQL Server always reads the root page, the intermediate pages and the leaf page(s). If the index is not unique, the storage engine reads backwards or forwards the appropriate leaf pages via the doubly linked chain to find all records.
This approach is the same for both types of index, but there is a big difference in what is stored on the leaf page.
In a clustered index the leaf page consists of the actual data including even all non index key columns. So SQL Server has all the data as soon as the leaf page of the B-tree is read.
In a non-clustered index, the content of the leaf pages looks different. The leaf pages have no more data than the index key columns itself. Now imagine you are looking for a set of columns in the table and not all columns are index key columns. The storage engine must retrieve the missing columns from elsewhere. If your table is a heap, that is, a table without a clustered index, the engine looks for the extra columns in the heap using a so called RID which is the address of the missing records in form of the filed id, page id and slot id. If your table is a clustered table the query engine will get the data from the clustered index using the index key.
In both cases, SQL Server implements a bookmark lookup iterator in the query execution plan to find missing data which requires additional page reads (lookups) to retrieve all the data and therefore decreases the read performance. To avoid the additional page reads an index should cover all requested columns in the B-Tree.
Now we know that the clustered index is always covering and the non clustered index covers only all columns if they are part of the index key columns.
This leads to two problems. First, there can be only one clustered index per table and therefore the clustered index covers only queries which includes the key columns in the WHERE predicate. Second, to ensure that the non clustered index is covering, we must define all columns of the WHERE predicate as index key columns but this will increase the size of the index pages and therefore the number of levels in the B-tree.
Fortunately, in SQL Server, you can insert the required columns in the leaf pages of the non clustered index, while the index key columns are only part of the root and intermediate pages. Therefore, the index root page and the pages of the intermediate levels remain small but the index contains all defined columns in the index leaf pages. This is very similar to the concept of a second clustered index.
Now let’s execute a simple query in SQL Server.
USE AdventureWorks2017; GO DROP TABLE IF EXISTS dbo.SalesOrderDetail; GO -- Create a table based on Sales.SalesOrderDetail SELECT * INTO dbo.SalesOrderDetail FROM Sales.SalesOrderDetail; GO -- Index recommendation from missing index dmv -- Costs : 1,25377 -- Table 'SalesOrderDetail'. Scan count 1, logical reads 1509 -- The Query Processor estimates that implementing the following index could -- improve the query cost by 99.7159%. SELECT SalesOrderID , SalesOrderDetailID , CarrierTrackingNumber , OrderQty , ProductID , SpecialOfferID FROM dbo.SalesOrderDetail WHERE SalesOrderID = 43670 AND SalesOrderDetailID > 112 OPTION (QUERYTRACEON 8757); GO
SQL Server must scan the entire table because there is no index on the table which will lead to 1509 logical page reads and costs of 1,25377. That is, why SQL Server will recommend the following index:
-- Index recommendation from missing index dmv -- Create a non clustered index CREATE NONCLUSTERED INDEX [test] ON dbo.SalesOrderDetail ( SalesOrderID , SalesOrderDetailID ); GO
After implementing the non clustered index, the query engine uses the index to find all records whose SalesOrderID is equal to 43670 and SalesOrderDetailID is greater than 112.
The engine just needs 5 logical reads and the costs are reduced to 0,0290654. Not bad so far, but let us insert a few more records into the table.
SET IDENTITY_INSERT dbo.SalesOrderDetail ON; GO INSERT dbo.SalesOrderDetail ( SalesOrderDetailID , SalesOrderID , CarrierTrackingNumber , OrderQty , ProductID , SpecialOfferID , UnitPrice , UnitPriceDiscount , LineTotal , rowguid , ModifiedDate ) SELECT TOP 100 113 AS SalesOrderDetailID , 43670 AS SalesOrderID , CarrierTrackingNumber , OrderQty , ProductID , SpecialOfferID , UnitPrice , UnitPriceDiscount , LineTotal , rowguid , ModifiedDate FROM Sales.SalesOrderDetail; GO SET IDENTITY_INSERT dbo.SalesOrderDetail OFF GO
After inserting 100 additional records, they satisfy the WHERE predicate and rerun the query, the execution plan remains the same the same but the number of logical IO reads increases to 106.
SQL Server has to find the missing columns in the heap which causes one page read per record which is identified in the non clustered index seek – a bookmark lookup will not scale. You can clearly see it in the execution plan:
So the more records found by the index seek operator the more RID lookups are necessary to retrieve the additional columns (I do not want to talk about the tipping point in this post).
There are two solutions we can implement to get rid of the non scaling bookmark operator. We can introduce a clustered index or create a non clustered index, which includes the missing columns in the leaf pages of the B-tree. Let us create the covering non clustered index.
CREATE NONCLUSTERED INDEX NCL_SalesOrderDetail_SalesOrderID_SalesOrderDetailID_INCL ON SalesOrderDetail ( SalesOrderID , SalesOrderDetailID ) INCLUDE ( CarrierTrackingNumber , OrderQty , ProductID , SpecialOfferID ); GO
If we run the same query as before we can see the following improvement:
-- Costs : 0,0032986 -- Table 'SalesOrderDetail'. Scan count 1, logical reads 4 SELECT SalesOrderID , SalesOrderDetailID , CarrierTrackingNumber , OrderQty , ProductID , SpecialOfferID FROM dbo.SalesOrderDetail WHERE SalesOrderID = 43670 AND SalesOrderDetailID > 112; GO
The covering non clustered index eliminates the non scaling bookmark lookup and therefore the number of logical pages reads (106 to 4).
This is another example of the limitations of the missing index feature.
- The limitations of the missing index feature – It suggests creating indexes on imprecise and not persisted columns
- The limitations of the missing index feature – It does not specify an order for columns to be used in an index
- The limitations of the missing index feature – It does not consider trivial plans
- The limitations of the missing index feature – It cannot gather statistics for more than 500 missing index groups