In the last post you saw that the missing index feature does not consider the order for columns to be used in an index.
In this blog I will explain another limitation of the missing index feature – it reports only include columns for some queries.
SQL Server supports two kind of row store indexes – the clustered index and then non-clustered index. It is important to understand the difference between these two indexes in order to implement the right index type to improve reading performance.
A row store index creates a B-tree based on the index key columns, where the depth-level of the B-tree depends on the number of data respectively index pages needed to store all of the data. If SQL Server looks for a specific value in an index key column it will read the root page and from there it is directed to the appropriate page of next level in the index tree. Finally, SQL Server always reads the root page, the intermediate pages and the leaf page(s). If the index is not unique, the storage engine reads backwards or forwards the appropriate leaf pages via the doubly linked chain to find all records.
This approach is the same for both types of index, but there is a big difference in what is stored on the leaf page.
In a clustered index the leaf page consists of the actual data including even all non index key columns. So SQL Server has all the data as soon as the leaf page of the B-tree is read.
In a non-clustered index, the content of the leaf pages looks different. The leaf pages have no more data than the index key columns itself. Now imagine you are looking for a set of columns in the table and not all columns are index key columns. The storage engine must retrieve the missing columns from elsewhere. If your table is a heap, that is, a table without a clustered index, the engine looks for the extra columns in the heap using a so called RID which is the address of the missing records in form of the filed id, page id and slot id. If your table is a clustered table the query engine will get the data from the clustered index using the index key.
In both cases, SQL Server implements a bookmark lookup iterator in the query execution plan to find missing data which requires additional page reads (lookups) to retrieve all the data and therefore decreases the read performance. To avoid the additional page reads an index should cover all requested columns in the B-Tree.
Now we know that the clustered index is always covering and the non clustered index covers only all columns if they are part of the index key columns.
This leads to two problems. First, there can be only one clustered index per table and therefore the clustered index covers only queries which includes the key columns in the WHERE predicate. Second, to ensure that the non clustered index is covering, we must define all columns of the WHERE predicate as index key columns but this will increase the size of the index pages and therefore the number of levels in the B-tree.
Fortunately, in SQL Server, you can insert the required columns in the leaf pages of the non clustered index, while the index key columns are only part of the root and intermediate pages. Therefore, the index root page and the pages of the intermediate levels remain small but the index contains all defined columns in the index leaf pages. This is very similar to the concept of a second clustered index.
Now let’s execute a simple query in SQL Server.
USE AdventureWorks2017;

DROP TABLE IF EXISTS dbo.SalesOrderDetail;

-- Create a table based on Sales.SalesOrderDetail
INTO dbo.SalesOrderDetail

-- Index recommendation from missing index dmv
-- Costs : 1,25377
-- Table 'SalesOrderDetail'. Scan count 1, logical reads 1509
-- The Query Processor estimates that implementing the following index could
-- improve the query cost by 99.7159%.
  , SalesOrderDetailID
  , CarrierTrackingNumber
  , OrderQty
  , ProductID
  , SpecialOfferID
	SalesOrderID = 43670
	AND SalesOrderDetailID > 112

SQL Server must scan the entire table because there is no index on the table which will lead to 1509 logical page reads and costs of 1,25377. That is, why SQL Server will recommend the following index:

-- Index recommendation from missing index dmv
-- Create a non clustered index
ON dbo.SalesOrderDetail (
						  , SalesOrderDetailID

After implementing the non clustered index, the query engine uses the index to find all records whose SalesOrderID is equal to 43670 and SalesOrderDetailID is greater than 112.