Identifying and Removing Duplicate Indexes in SQL Server

I’ve been simplifying my performance tuning approach. One method I’ve used is finding exact duplicate indexes. Duplicate indexes on a table are problematic because whenever you insert, update, or delete a record on that table, SQL Server needs to update the same index multiple times.

Why Duplicate Indexes Are Problematic

Duplicate indexes are indexes that have the same key columns AND the same included columns as another index on the same table.

Issues caused by duplicate indexes include:

Increased Storage Space: Each index consumes disk space. Duplicate indexes unnecessarily inflate the size of your database.
Slower Write Operations: INSERT, UPDATE, and DELETE operations become slower because the database engine must update the duplicate indexes.
Longer Maintenance Windows: Operations like index rebuilds and updates take longer, impacting maintenance tasks and system availability.

The Script

I use this SQL query to find exact duplicate indexes in a database. The query is straightforward when you run it, but I’ll break it down so you can understand how it works.

;WITH CTE_IndexData
AS (
	SELECT s.name AS SchemaName
		,t.name AS TableName
		,i.name AS IndexName
		-- Key columns of the index
		,STUFF((
				SELECT ', ' + c.name + ' ' + CASE 
						WHEN ic.is_descending_key = 1
							THEN 'DESC'
						ELSE 'ASC'
						END
				FROM sys.index_columns ic
				INNER JOIN sys.columns c ON ic.object_id = c.object_id
					AND ic.column_id = c.column_id
				WHERE ic.object_id = i.object_id
					AND ic.index_id = i.index_id
					AND ic.is_included_column = 0
				ORDER BY ic.key_ordinal
				FOR XML PATH('')
					,TYPE
				).value('.', 'NVARCHAR(MAX)'), 1, 2, '') AS KeyColumnList
		-- Included columns of the index
		,STUFF((
				SELECT ', ' + c.name
				FROM sys.index_columns ic
				INNER JOIN sys.columns c ON ic.object_id = c.object_id
					AND ic.column_id = c.column_id
				WHERE ic.object_id = i.object_id
					AND ic.index_id = i.index_id
					AND ic.is_included_column = 1
				ORDER BY ic.key_ordinal
				FOR XML PATH('')
					,TYPE
				).value('.', 'NVARCHAR(MAX)'), 1, 2, '') AS IncludeColumnList
		,i.is_disabled
		-- Generate the DROP INDEX statement
		,'DROP INDEX [' + i.name + '] ON [' + s.name + '].[' + t.name + ']' AS DropIndexStatement
	FROM sys.indexes i
	INNER JOIN sys.tables t ON i.object_id = t.object_id
	INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
	WHERE t.is_ms_shipped = 0
		AND i.type_desc IN (
			'NONCLUSTERED'
			,'CLUSTERED'
			)
	)
SELECT d1.SchemaName
	,d1.TableName
	,d1.IndexName
	,d1.KeyColumnList
	,d1.IncludeColumnList
	,d1.is_disabled
	,d1.DropIndexStatement
FROM CTE_IndexData d1
WHERE EXISTS (
		SELECT 1
		FROM CTE_IndexData d2
		WHERE d1.SchemaName = d2.SchemaName
			AND d1.TableName = d2.TableName
			AND d1.KeyColumnList = d2.KeyColumnList
			AND ISNULL(d1.IncludeColumnList, '') = ISNULL(d2.IncludeColumnList, '')
			AND d1.IndexName <> d2.IndexName
		)
ORDER BY d1.SchemaName
	,d1.TableName
	,d1.KeyColumnList;

How It Works

Gathering Basic Index Information

First, I use a Common Table Expression (CTE) called CTE_IndexData to gather basic information about existing indexes:

SELECT s.name AS SchemaName
    ,t.name AS TableName
    ,i.name AS IndexNam

Retrieve Key Columns

Next, I create a comma-separated list of key columns for each index, including sort order (ASC/DESC). The STUFF function removes the leading comma and space from the concatenated string. I use a subquery to concatenate column names and their sort orders.

The FOR XML PATH('') trick is used to concatenate the results into a single string.

,STUFF((
    SELECT ', ' + c.name + ' ' + CASE 
            WHEN ic.is_descending_key = 1
                THEN 'DESC'
            ELSE 'ASC'
            END
    FROM sys.index_columns ic
    INNER JOIN sys.columns c ON ic.object_id = c.object_id
        AND ic.column_id = c.column_id
    WHERE ic.object_id = i.object_id
        AND ic.index_id = i.index_id
        AND ic.is_included_column = 0
    ORDER BY ic.key_ordinal
    FOR XML PATH('')
        ,TYPE
    ).value('.', 'NVARCHAR(MAX)'), 1, 2, '') AS KeyColumnList

Retrieve Included Columns

This creates a comma-separated list of included columns for each index. This works like the key columns retrieval but filters on is_included_column = 1.

,STUFF((
    SELECT ', ' + c.name
    FROM sys.index_columns ic
    INNER JOIN sys.columns c ON ic.object_id = c.object_id
        AND ic.column_id = c.column_id
    WHERE ic.object_id = i.object_id
        AND ic.index_id = i.index_id
        AND ic.is_included_column = 1
    ORDER BY ic.key_ordinal
    FOR XML PATH('')
        ,TYPE
    ).value('.', 'NVARCHAR(MAX)'), 1, 2, '') AS IncludeColumnList

Drop Index Statement

This part generates the DROP INDEX statement. It doesn’t help identify duplicates, but it’s convenient to have.

,i.is_disabled
,'DROP INDEX [' + i.name + '] ON [' + s.name + '].[' + t.name + ']' AS DropIndexStatement

Find the Duplicates in my Results

Finally, this section identifies the actual duplicates. The main query finds indexes that have duplicates based on their key columns and included columns. To do this, the query joins CTE_IndexData to itself—as d1 and d2—so we can compare indexes within the same table.

Here’s how it works:

Different Index Names: By confirming that the index names are different (d1.IndexName <> d2.IndexName), we avoid comparing an index with itself, ensuring that we’re only identifying true duplicates.
Same Schema and Table: Ensuring both d1 and d2 have the same schema and table names confirms that we’re comparing indexes within the same table, keeping our focus precise.
Same Key Columns: We check if the key columns are identical (d1.KeyColumnList = d2.KeyColumnList), which helps us identify indexes that have the same structure and potentially serve the same purpose.
Same Included Columns: The query compares the included columns, handling any NULL values with the ISNULL function (ISNULL(d1.IncludeColumnList, '') = ISNULL(d2.IncludeColumnList, '')). This ensures that indexes aren’t mistakenly identified as different due to missing data.

SELECT d1.SchemaName
    ,d1.TableName
    ,d1.IndexName
    ,d1.KeyColumnList
    ,d1.IncludeColumnList
    ,d1.is_disabled
    ,d1.DropIndexStatement
FROM CTE_IndexData d1
WHERE EXISTS (
    SELECT 1
    FROM CTE_IndexData d2
    WHERE d1.SchemaName = d2.SchemaName
        AND d1.TableName = d2.TableName
        AND d1.KeyColumnList = d2.KeyColumnList
        AND ISNULL(d1.IncludeColumnList, '') = ISNULL(d2.IncludeColumnList, '')
        AND d1.IndexName <> d2.IndexName
    )
ORDER BY d1.SchemaName
    ,d1.TableName
    ,d1.KeyColumnList;

Limitations

There are several things this query doesn’t check for, so be careful when dropping any index. When deciding which index to drop, it’s best to drop the one with fewer reads. You can find that with this query:

SELECT o.name AS TableName
    ,i.name AS IndexName
    ,i.type_desc AS IndexType
    ,u.user_seeks AS UserSeeks
    ,u.user_scans AS UserScans
    ,u.user_lookups AS UserLookups
    ,u.user_updates AS UserUpdates
    ,u.last_user_seek AS LastUserSeek
    ,u.last_user_scan AS LastUserScan
    ,u.last_user_lookup AS LastUserLookup
    ,u.last_user_update AS LastUserUpdate
FROM sys.dm_db_index_usage_stats AS u
INNER JOIN sys.indexes AS i ON u.object_id = i.object_id
    AND u.index_id = i.index_id
INNER JOIN sys.objects AS o ON i.object_id = o.object_id
WHERE u.database_id = DB_ID()
    --AND o.name = 'Badges' -- Filter by table name
ORDER BY o.name
     ,i.name;

This query also doesn’t check for things like whether it’s a unique index, compression, fill factor, among others.

As always, test changes in a non-production environment before making them in production. Don’t ruin your Friday.

Using PowerShell to Automate Glenn Berry's SQL DMV Queries

How to Improve SQL Server Performance by Avoiding Key Lookups