Identifying and Removing Duplicate Indexes in SQL Server

In an earlier post, I mentioned I was looking for ways to simplify my performance tuning efforts. One of the approaches I’ve taken is finding exact duplicate indexes. Duplicate indexes on a table are problematic because anytime you insert, update, or delete a record on that table, SQL Server needs to update that same index multiple times. I don’t like doing the same task twice, so I don’t want SQL Server to have to do it either.

Why Duplicate Indexes Are Problematic

Duplicate indexes are indexes that have the same key columns and, optionally, the same included columns as another index on the same table. They can exist due to various reasons, such as oversight during development, lack of proper index management, or legacy code.

Issues caused by duplicate indexes include:

Increased Storage Space: Each index consumes disk space. Duplicate indexes unnecessarily inflate the size of your database.
Slower Write Operations: INSERT, UPDATE, and DELETE operations become slower because the database engine must update the duplicate indexes.
Longer Maintenance Windows: Operations like index rebuilds and updates take longer, impacting maintenance tasks and system availability.

The Script

I use this comprehensive SQL query to help me find exact duplicate indexes in a database. This query is straightforward once you run it, but for those of you interested, I’ll break down the query into smaller parts so you can understand how it works.

;WITH CTE_IndexData
AS (
	SELECT s.name AS SchemaName
		,t.name AS TableName
		,i.name AS IndexName
		-- Key columns of the index
		,STUFF((
				SELECT ', ' + c.name + ' ' + CASE 
						WHEN ic.is_descending_key = 1
							THEN 'DESC'
						ELSE 'ASC'
						END
				FROM sys.index_columns ic
				INNER JOIN sys.columns c ON ic.object_id = c.object_id
					AND ic.column_id = c.column_id
				WHERE ic.object_id = i.object_id
					AND ic.index_id = i.index_id
					AND ic.is_included_column = 0
				ORDER BY ic.key_ordinal
				FOR XML PATH('')
					,TYPE
				).value('.', 'NVARCHAR(MAX)'), 1, 2, '') AS KeyColumnList
		-- Included columns of the index
		,STUFF((
				SELECT ', ' + c.name
				FROM sys.index_columns ic
				INNER JOIN sys.columns c ON ic.object_id = c.object_id
					AND ic.column_id = c.column_id
				WHERE ic.object_id = i.object_id
					AND ic.index_id = i.index_id
					AND ic.is_included_column = 1
				ORDER BY ic.key_ordinal
				FOR XML PATH('')
					,TYPE
				).value('.', 'NVARCHAR(MAX)'), 1, 2, '') AS IncludeColumnList
		,i.is_disabled
		-- Generate the DROP INDEX statement
		,'DROP INDEX [' + i.name + '] ON [' + s.name + '].[' + t.name + ']' AS DropIndexStatement
	FROM sys.indexes i
	INNER JOIN sys.tables t ON i.object_id = t.object_id
	INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
	WHERE t.is_ms_shipped = 0
		AND i.type_desc IN (
			'NONCLUSTERED'
			,'CLUSTERED'
			)
	)
SELECT d1.SchemaName
	,d1.TableName
	,d1.IndexName
	,d1.KeyColumnList
	,d1.IncludeColumnList
	,d1.is_disabled
	,d1.DropIndexStatement
FROM CTE_IndexData d1
WHERE EXISTS (
		SELECT 1
		FROM CTE_IndexData d2
		WHERE d1.SchemaName = d2.SchemaName
			AND d1.TableName = d2.TableName
			AND d1.KeyColumnList = d2.KeyColumnList
			AND ISNULL(d1.IncludeColumnList, '') = ISNULL(d2.IncludeColumnList, '')
			AND d1.IndexName <> d2.IndexName
		)
ORDER BY d1.SchemaName
	,d1.TableName
	,d1.KeyColumnList;

How It Works

Gathering Basic Index Information

First, I use a Common Table Expression (CTE) called CTE_IndexData to gather some basic information about the indexes that already exist:

SELECT s.name AS SchemaName
    ,t.name AS TableName
    ,i.name AS IndexNam

Retrieve Key Columns

Next, I create a comma-separated list of the key columns for each index, including the sort order (ASC/DESC). The STUFF function removes the leading comma and space from the concatenated string. I use a subquery to concatenate column names and their sort orders.

The FOR XML PATH('') trick is used to concatenate the results into a single string.

,STUFF((
    SELECT ', ' + c.name + ' ' + CASE 
            WHEN ic.is_descending_key = 1
                THEN 'DESC'
            ELSE 'ASC'
            END
    FROM sys.index_columns ic
    INNER JOIN sys.columns c ON ic.object_id = c.object_id
        AND ic.column_id = c.column_id
    WHERE ic.object_id = i.object_id
        AND ic.index_id = i.index_id
        AND ic.is_included_column = 0
    ORDER BY ic.key_ordinal
    FOR XML PATH('')
        ,TYPE
    ).value('.', 'NVARCHAR(MAX)'), 1, 2, '') AS KeyColumnList

Retrieve Included Columns

This creates a comma-separated list of included columns for each index. This works similarly to the key columns retrieval but filters on is_included_column = 1.

,STUFF((
    SELECT ', ' + c.name
    FROM sys.index_columns ic
    INNER JOIN sys.columns c ON ic.object_id = c.object_id
        AND ic.column_id = c.column_id
    WHERE ic.object_id = i.object_id
        AND ic.index_id = i.index_id
        AND ic.is_included_column = 1
    ORDER BY ic.key_ordinal
    FOR XML PATH('')
        ,TYPE
    ).value('.', 'NVARCHAR(MAX)'), 1, 2, '') AS IncludeColumnList

Drop Index Statement

This part is just to script out the DROP INDEX statement. This holds no logical value in identifying duplicates; it was just a “nice to have.”

,i.is_disabled
,'DROP INDEX [' + i.name + '] ON [' + s.name + '].[' + t.name + ']' AS DropIndexStatement

Find the Duplicates in my Results

Lastly, this section identifies the actual duplicates. In the main query, we’re zeroing in on indexes that have duplicates based on their key columns and included columns. To accomplish this, the query joins the CTE_IndexData to itself—as d1 and d2—allowing us to compare indexes within the same table seamlessly.

Here’s how it works:

Different Index Names: By confirming that the index names are different (d1.IndexName <> d2.IndexName), we avoid comparing an index with itself, ensuring that we’re only identifying true duplicates.
Same Schema and Table: Ensuring both d1 and d2 have the same schema and table names confirms that we’re comparing indexes within the same table, keeping our focus precise.
Same Key Columns: We check if the key columns are identical (d1.KeyColumnList = d2.KeyColumnList), which helps us identify indexes that have the same structure and potentially serve the same purpose.
Same Included Columns: The query compares the included columns, handling any NULL values with the ISNULL function (ISNULL(d1.IncludeColumnList, '') = ISNULL(d2.IncludeColumnList, '')). This ensures that indexes aren’t mistakenly identified as different due to missing data.

SELECT d1.SchemaName
    ,d1.TableName
    ,d1.IndexName
    ,d1.KeyColumnList
    ,d1.IncludeColumnList
    ,d1.is_disabled
    ,d1.DropIndexStatement
FROM CTE_IndexData d1
WHERE EXISTS (
    SELECT 1
    FROM CTE_IndexData d2
    WHERE d1.SchemaName = d2.SchemaName
        AND d1.TableName = d2.TableName
        AND d1.KeyColumnList = d2.KeyColumnList
        AND ISNULL(d1.IncludeColumnList, '') = ISNULL(d2.IncludeColumnList, '')
        AND d1.IndexName <> d2.IndexName
    )
ORDER BY d1.SchemaName
    ,d1.TableName
    ,d1.KeyColumnList;

Limitations

There are a number of things this query doesn’t check for, explicitly, and you should still take precautions when dropping any index. First off, when deciding which index to drop, it’s best to drop the index with fewer reads. You can find that using the following query:

SELECT o.name AS TableName
    ,i.name AS IndexName
    ,i.type_desc AS IndexType
    ,u.user_seeks AS UserSeeks
    ,u.user_scans AS UserScans
    ,u.user_lookups AS UserLookups
    ,u.user_updates AS UserUpdates
    ,u.last_user_seek AS LastUserSeek
    ,u.last_user_scan AS LastUserScan
    ,u.last_user_lookup AS LastUserLookup
    ,u.last_user_update AS LastUserUpdate
FROM sys.dm_db_index_usage_stats AS u
INNER JOIN sys.indexes AS i ON u.object_id = i.object_id
    AND u.index_id = i.index_id
INNER JOIN sys.objects AS o ON i.object_id = o.object_id
WHERE u.database_id = DB_ID()
    --AND o.name = 'Badges' -- Filter by table name
ORDER BY o.name
     ,i.name;

Some additional things this query doesn’t check for, so you should yourself, is whether it is a unique index, compression, fill factor, just to name a few.

As always, test changes in a non-production environment before making them in production. I.E. don’t ruin your Friday.

Using PowerShell to Automate Glenn Berry's SQL DMV Queries

How to Improve SQL Server Performance by Avoiding Key Lookups