I’ve been simplifying my performance tuning approach. One method I’ve used is finding exact duplicate indexes. Duplicate indexes on a table are problematic because whenever you insert, update, or delete a record on that table, SQL Server needs to update the same index multiple times.
Why Duplicate Indexes Are Problematic
Duplicate indexes are indexes that have the same key columns AND the same included columns as another index on the same table.
Issues caused by duplicate indexes include:
- Increased Storage Space: Each index consumes disk space. Duplicate indexes unnecessarily inflate the size of your database.
- Slower Write Operations:
INSERT,UPDATE, andDELETEoperations become slower because the database engine must update the duplicate indexes. - Longer Maintenance Windows: Operations like index rebuilds and updates take longer, impacting maintenance tasks and system availability.
The Script
I use this SQL query to find exact duplicate indexes in a database. The query is straightforward when you run it, but I’ll break it down so you can understand how it works.
;WITH CTE_IndexData
AS (
SELECT s.name AS SchemaName
,t.name AS TableName
,i.name AS IndexName
-- Key columns of the index
,STUFF((
SELECT ', ' + c.name + ' ' + CASE
WHEN ic.is_descending_key = 1
THEN 'DESC'
ELSE 'ASC'
END
FROM sys.index_columns ic
INNER JOIN sys.columns c ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
WHERE ic.object_id = i.object_id
AND ic.index_id = i.index_id
AND ic.is_included_column = 0
ORDER BY ic.key_ordinal
FOR XML PATH('')
,TYPE
).value('.', 'NVARCHAR(MAX)'), 1, 2, '') AS KeyColumnList
-- Included columns of the index
,STUFF((
SELECT ', ' + c.name
FROM sys.index_columns ic
INNER JOIN sys.columns c ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
WHERE ic.object_id = i.object_id
AND ic.index_id = i.index_id
AND ic.is_included_column = 1
ORDER BY ic.key_ordinal
FOR XML PATH('')
,TYPE
).value('.', 'NVARCHAR(MAX)'), 1, 2, '') AS IncludeColumnList
,i.is_disabled
-- Generate the DROP INDEX statement
,'DROP INDEX [' + i.name + '] ON [' + s.name + '].[' + t.name + ']' AS DropIndexStatement
FROM sys.indexes i
INNER JOIN sys.tables t ON i.object_id = t.object_id
INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
WHERE t.is_ms_shipped = 0
AND i.type_desc IN (
'NONCLUSTERED'
,'CLUSTERED'
)
)
SELECT d1.SchemaName
,d1.TableName
,d1.IndexName
,d1.KeyColumnList
,d1.IncludeColumnList
,d1.is_disabled
,d1.DropIndexStatement
FROM CTE_IndexData d1
WHERE EXISTS (
SELECT 1
FROM CTE_IndexData d2
WHERE d1.SchemaName = d2.SchemaName
AND d1.TableName = d2.TableName
AND d1.KeyColumnList = d2.KeyColumnList
AND ISNULL(d1.IncludeColumnList, '') = ISNULL(d2.IncludeColumnList, '')
AND d1.IndexName <> d2.IndexName
)
ORDER BY d1.SchemaName
,d1.TableName
,d1.KeyColumnList;
How It Works
Gathering Basic Index Information
First, I use a Common Table Expression (CTE) called CTE_IndexData to gather basic information about existing indexes:
SELECT s.name AS SchemaName
,t.name AS TableName
,i.name AS IndexNam
Retrieve Key Columns
Next, I create a comma-separated list of key columns for each index, including sort order (ASC/DESC). The STUFF function removes the leading comma and space from the concatenated string. I use a subquery to concatenate column names and their sort orders.
The FOR XML PATH('') trick is used to concatenate the results into a single string.
,STUFF((
SELECT ', ' + c.name + ' ' + CASE
WHEN ic.is_descending_key = 1
THEN 'DESC'
ELSE 'ASC'
END
FROM sys.index_columns ic
INNER JOIN sys.columns c ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
WHERE ic.object_id = i.object_id
AND ic.index_id = i.index_id
AND ic.is_included_column = 0
ORDER BY ic.key_ordinal
FOR XML PATH('')
,TYPE
).value('.', 'NVARCHAR(MAX)'), 1, 2, '') AS KeyColumnList
Retrieve Included Columns
This creates a comma-separated list of included columns for each index. This works like the key columns retrieval but filters on is_included_column = 1.
,STUFF((
SELECT ', ' + c.name
FROM sys.index_columns ic
INNER JOIN sys.columns c ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
WHERE ic.object_id = i.object_id
AND ic.index_id = i.index_id
AND ic.is_included_column = 1
ORDER BY ic.key_ordinal
FOR XML PATH('')
,TYPE
).value('.', 'NVARCHAR(MAX)'), 1, 2, '') AS IncludeColumnList
Drop Index Statement
This part generates the DROP INDEX statement. It doesn’t help identify duplicates, but it’s convenient to have.
,i.is_disabled
,'DROP INDEX [' + i.name + '] ON [' + s.name + '].[' + t.name + ']' AS DropIndexStatement
Find the Duplicates in my Results
Finally, this section identifies the actual duplicates. The main query finds indexes that have duplicates based on their key columns and included columns. To do this, the query joins CTE_IndexData to itself—as d1 and d2—so we can compare indexes within the same table.
Here’s how it works:
- Different Index Names: By confirming that the index names are different (
d1.IndexName <> d2.IndexName), we avoid comparing an index with itself, ensuring that we’re only identifying true duplicates. - Same Schema and Table: Ensuring both
d1andd2have the same schema and table names confirms that we’re comparing indexes within the same table, keeping our focus precise. - Same Key Columns: We check if the key columns are identical (
d1.KeyColumnList = d2.KeyColumnList), which helps us identify indexes that have the same structure and potentially serve the same purpose. - Same Included Columns: The query compares the included columns, handling any
NULLvalues with theISNULLfunction (ISNULL(d1.IncludeColumnList, '') = ISNULL(d2.IncludeColumnList, '')). This ensures that indexes aren’t mistakenly identified as different due to missing data.
SELECT d1.SchemaName
,d1.TableName
,d1.IndexName
,d1.KeyColumnList
,d1.IncludeColumnList
,d1.is_disabled
,d1.DropIndexStatement
FROM CTE_IndexData d1
WHERE EXISTS (
SELECT 1
FROM CTE_IndexData d2
WHERE d1.SchemaName = d2.SchemaName
AND d1.TableName = d2.TableName
AND d1.KeyColumnList = d2.KeyColumnList
AND ISNULL(d1.IncludeColumnList, '') = ISNULL(d2.IncludeColumnList, '')
AND d1.IndexName <> d2.IndexName
)
ORDER BY d1.SchemaName
,d1.TableName
,d1.KeyColumnList;
Limitations
There are several things this query doesn’t check for, so be careful when dropping any index. When deciding which index to drop, it’s best to drop the one with fewer reads. You can find that with this query:
SELECT o.name AS TableName
,i.name AS IndexName
,i.type_desc AS IndexType
,u.user_seeks AS UserSeeks
,u.user_scans AS UserScans
,u.user_lookups AS UserLookups
,u.user_updates AS UserUpdates
,u.last_user_seek AS LastUserSeek
,u.last_user_scan AS LastUserScan
,u.last_user_lookup AS LastUserLookup
,u.last_user_update AS LastUserUpdate
FROM sys.dm_db_index_usage_stats AS u
INNER JOIN sys.indexes AS i ON u.object_id = i.object_id
AND u.index_id = i.index_id
INNER JOIN sys.objects AS o ON i.object_id = o.object_id
WHERE u.database_id = DB_ID()
--AND o.name = 'Badges' -- Filter by table name
ORDER BY o.name
,i.name;
This query also doesn’t check for things like whether it’s a unique index, compression, fill factor, among others.
As always, test changes in a non-production environment before making them in production. Don’t ruin your Friday.