Often you’ll run into cases where a SQL Server table will contain duplicate rows and you’re tasked with removing them. Here’s one way to do it with a PARTITION BY.
I blame the front-end developers when this happens, even if I’m the sole developer of the app. 😊
In the following SQL scripts, we will create a test table with some duplicate records and then we’ll remove them with a CTE (Common Table Expression) and a PARTITION BY window function.
I was kinda surprised that a delete from a CTE would remove records from the underlying table, but it does. You’ll need at least SQL Server 2008+ for this to work.
As with any delete operations on production data, make sure that you have a backup before proceeding.
Step 1 Create the test table
-- Create the test table CREATE TABLE dbo.TestTable( Code Varchar (1) NULL, n smallint NULL ) ON [PRIMARY] go -- Insert records with 2 duplicate rows INSERT INTO dbo.TestTable VALUES ('A',1), ('A',2), ('A',2), ('B',1), ('B',2), ('B',3), ('C',1), ('C',1), ('C',2) GO SELECT Code, n FROM dbo.TestTable ORDER BY Code, n go Code n A 1 A 2 A 2 B 1 B 2 B 3 C 1 C 1 C 2
Step 2 Check for duplicate records
-- Display any duplicate rows using GROUP BY AND COUNT() SELECT Code, n, Count(*) AS Duplicate_Count FROM dbo.TestTable GROUP BY Code, n HAVING Count(*) > 1 GO Code n Duplicate_Count C 1 2 A 2 2
Step 3 Display the Duplicate Records using PARTITION BY
-- Create a DuplicateCount by Partition over the Code, n fields -- Duplicates will have a DuplicateCount > 1. -- The 2 Rows as shown in the previous query -- To use with your tables add all the existing columns to the PARTITION BY clause WITH CTE (Code, n, Duplicate_Count) AS ( SELECT Code, n, ROW_NUMBER() OVER(PARTITION BY Code, n ORDER BY Code) AS Duplicate_Count FROM dbo.TestTable ) SELECT * FROM CTE ORDER BY Code, n, Duplicate_Count GO Code n Duplicate_Count A 1 1 A 2 1 A 2 2 B 1 1 B 2 1 B 3 1 C 1 1 C 1 2 C 2 1
Step 5 Delete the duplicate rows and return the remaining data
-- Delete Duplicate records WITH cte (Code, n, Duplicate_Count) AS (SELECT code, n, ROW_NUMBER() OVER( PARTITION BY Code, n ORDER BY Code) AS Duplicate_Count FROM dbo.TestTable) DELETE FROM cte WHERE Duplicate_Count > 1 GO -- Select remaining records with no duplicates SELECT Code, n FROM dbo.TestTable ORDER BY Code, n GO Code n A 1 A 2 B 1 B 2 B 3 C 1 C 2
Step 6 Double check the record count after the delete
I always do a record count after a delete as a sanity check. I expect 7 records to remain after I remove the 2 duplicates.
SELECT COUNT(*) FROM dbo.TestTable GO 7
A 30+ year database veteran. I enjoy sharing my stories and knowledge with others.