In an earlier post, I mentioned I was looking for ways to simplify my performance tuning efforts. One of the approaches I’ve taken is finding exact duplicate indexes. Duplicate indexes on a table are problematic because anytime you insert, update, or delete a record on that table, SQL Server needs to update that same index multiple times. I don’t like doing the same task twice, so I don’t want SQL Server to have to do it either.
Why Duplicate Indexes Are Problematic
Duplicate indexes are indexes that have the same key columns and, optionally, the same included columns as another index on the same table. They can exist due to various reasons, such as oversight during development, lack of proper index management, or legacy code.
Issues caused by duplicate indexes include:
- Increased Storage Space: Each index consumes disk space. Duplicate indexes unnecessarily inflate the size of your database.
- Slower Write Operations:
INSERT
,UPDATE
, andDELETE
operations become slower because the database engine must update the duplicate indexes. - Longer Maintenance Windows: Operations like index rebuilds and updates take longer, impacting maintenance tasks and system availability.
The Script
I use this comprehensive SQL query to help me find exact duplicate indexes in a database. This query is straightforward once you run it, but for those of you interested, I’ll break down the query into smaller parts so you can understand how it works.
;WITH CTE_IndexData
AS (
SELECT s.name AS SchemaName
,t.name AS TableName
,i.name AS IndexName
-- Key columns of the index
,STUFF((
SELECT ', ' + c.name + ' ' + CASE
WHEN ic.is_descending_key = 1
THEN 'DESC'
ELSE 'ASC'
END
FROM sys.index_columns ic
INNER JOIN sys.columns c ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
WHERE ic.object_id = i.object_id
AND ic.index_id = i.index_id
AND ic.is_included_column = 0
ORDER BY ic.key_ordinal
FOR XML PATH('')
,TYPE
).value('.', 'NVARCHAR(MAX)'), 1, 2, '') AS KeyColumnList
-- Included columns of the index
,STUFF((
SELECT ', ' + c.name
FROM sys.index_columns ic
INNER JOIN sys.columns c ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
WHERE ic.object_id = i.object_id
AND ic.index_id = i.index_id
AND ic.is_included_column = 1
ORDER BY ic.key_ordinal
FOR XML PATH('')
,TYPE
).value('.', 'NVARCHAR(MAX)'), 1, 2, '') AS IncludeColumnList
,i.is_disabled
-- Generate the DROP INDEX statement
,'DROP INDEX [' + i.name + '] ON [' + s.name + '].[' + t.name + ']' AS DropIndexStatement
FROM sys.indexes i
INNER JOIN sys.tables t ON i.object_id = t.object_id
INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
WHERE t.is_ms_shipped = 0
AND i.type_desc IN (
'NONCLUSTERED'
,'CLUSTERED'
)
)
SELECT d1.SchemaName
,d1.TableName
,d1.IndexName
,d1.KeyColumnList
,d1.IncludeColumnList
,d1.is_disabled
,d1.DropIndexStatement
FROM CTE_IndexData d1
WHERE EXISTS (
SELECT 1
FROM CTE_IndexData d2
WHERE d1.SchemaName = d2.SchemaName
AND d1.TableName = d2.TableName
AND d1.KeyColumnList = d2.KeyColumnList
AND ISNULL(d1.IncludeColumnList, '') = ISNULL(d2.IncludeColumnList, '')
AND d1.IndexName <> d2.IndexName
)
ORDER BY d1.SchemaName
,d1.TableName
,d1.KeyColumnList;
How It Works
Gathering Basic Index Information
First, I use a Common Table Expression (CTE) called CTE_IndexData
to gather some basic information about the indexes that already exist:
SELECT s.name AS SchemaName
,t.name AS TableName
,i.name AS IndexNam
Retrieve Key Columns
Next, I create a comma-separated list of the key columns for each index, including the sort order (ASC/DESC). The STUFF
function removes the leading comma and space from the concatenated string. I use a subquery to concatenate column names and their sort orders.
The FOR XML PATH('')
trick is used to concatenate the results into a single string.
,STUFF((
SELECT ', ' + c.name + ' ' + CASE
WHEN ic.is_descending_key = 1
THEN 'DESC'
ELSE 'ASC'
END
FROM sys.index_columns ic
INNER JOIN sys.columns c ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
WHERE ic.object_id = i.object_id
AND ic.index_id = i.index_id
AND ic.is_included_column = 0
ORDER BY ic.key_ordinal
FOR XML PATH('')
,TYPE
).value('.', 'NVARCHAR(MAX)'), 1, 2, '') AS KeyColumnList
Retrieve Included Columns
This creates a comma-separated list of included columns for each index. This works similarly to the key columns retrieval but filters on is_included_column = 1
.
,STUFF((
SELECT ', ' + c.name
FROM sys.index_columns ic
INNER JOIN sys.columns c ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
WHERE ic.object_id = i.object_id
AND ic.index_id = i.index_id
AND ic.is_included_column = 1
ORDER BY ic.key_ordinal
FOR XML PATH('')
,TYPE
).value('.', 'NVARCHAR(MAX)'), 1, 2, '') AS IncludeColumnList
Drop Index Statement
This part is just to script out the DROP INDEX
statement. This holds no logical value in identifying duplicates; it was just a “nice to have.”
,i.is_disabled
,'DROP INDEX [' + i.name + '] ON [' + s.name + '].[' + t.name + ']' AS DropIndexStatement
Find the Duplicates in my Results
Lastly, this section identifies the actual duplicates. In the main query, we’re zeroing in on indexes that have duplicates based on their key columns and included columns. To accomplish this, the query joins the CTE_IndexData
to itself—as d1
and d2
—allowing us to compare indexes within the same table seamlessly.
Here’s how it works:
- Different Index Names: By confirming that the index names are different (
d1.IndexName <> d2.IndexName
), we avoid comparing an index with itself, ensuring that we’re only identifying true duplicates. - Same Schema and Table: Ensuring both
d1
andd2
have the same schema and table names confirms that we’re comparing indexes within the same table, keeping our focus precise. - Same Key Columns: We check if the key columns are identical (
d1.KeyColumnList = d2.KeyColumnList
), which helps us identify indexes that have the same structure and potentially serve the same purpose. - Same Included Columns: The query compares the included columns, handling any
NULL
values with theISNULL
function (ISNULL(d1.IncludeColumnList, '') = ISNULL(d2.IncludeColumnList, '')
). This ensures that indexes aren’t mistakenly identified as different due to missing data.
SELECT d1.SchemaName
,d1.TableName
,d1.IndexName
,d1.KeyColumnList
,d1.IncludeColumnList
,d1.is_disabled
,d1.DropIndexStatement
FROM CTE_IndexData d1
WHERE EXISTS (
SELECT 1
FROM CTE_IndexData d2
WHERE d1.SchemaName = d2.SchemaName
AND d1.TableName = d2.TableName
AND d1.KeyColumnList = d2.KeyColumnList
AND ISNULL(d1.IncludeColumnList, '') = ISNULL(d2.IncludeColumnList, '')
AND d1.IndexName <> d2.IndexName
)
ORDER BY d1.SchemaName
,d1.TableName
,d1.KeyColumnList;
Limitations
There are a number of things this query doesn’t check for, explicitly, and you should still take precautions when dropping any index. First off, when deciding which index to drop, it’s best to drop the index with fewer reads. You can find that using the following query:
SELECT o.name AS TableName
,i.name AS IndexName
,i.type_desc AS IndexType
,u.user_seeks AS UserSeeks
,u.user_scans AS UserScans
,u.user_lookups AS UserLookups
,u.user_updates AS UserUpdates
,u.last_user_seek AS LastUserSeek
,u.last_user_scan AS LastUserScan
,u.last_user_lookup AS LastUserLookup
,u.last_user_update AS LastUserUpdate
FROM sys.dm_db_index_usage_stats AS u
INNER JOIN sys.indexes AS i ON u.object_id = i.object_id
AND u.index_id = i.index_id
INNER JOIN sys.objects AS o ON i.object_id = o.object_id
WHERE u.database_id = DB_ID()
--AND o.name = 'Badges' -- Filter by table name
ORDER BY o.name
,i.name;
Some additional things this query doesn’t check for, so you should yourself, is whether it is a unique index, compression, fill factor, just to name a few.
As always, test changes in a non-production environment before making them in production. I.E. don’t ruin your Friday.