Often you’ll run into cases where a SQL Server table will contain duplicate rows and you’re tasked with removing them. Here’s one way to do it with a PARTITION BY.
I blame the front-end developers when this happens, even if I’m the sole developer of the app. 😊
In the following SQL scripts, we will create a test table with some duplicate records and then we’ll remove them with a CTE (Common Table Expression) and a PARTITION BY window function.
I was kinda surprised that a delete from a CTE would remove records from the underlying table, but it does. You’ll need at least SQL Server 2008+ for this to work.
As with any delete operations on production data, make sure that you have a backup before proceeding.
Step 1 Create the test table
-- Create the test table
CREATE TABLE dbo.TestTable(
Code Varchar (1) NULL,
n smallint NULL
) ON [PRIMARY]
go
-- Insert records with 2 duplicate rows
INSERT INTO dbo.TestTable
VALUES
('A',1),
('A',2),
('A',2),
('B',1),
('B',2),
('B',3),
('C',1),
('C',1),
('C',2)
GO
SELECT Code, n FROM dbo.TestTable ORDER BY Code, n
go
Code n
A 1
A 2
A 2
B 1
B 2
B 3
C 1
C 1
C 2
Step 2 Check for duplicate records
-- Display any duplicate rows using GROUP BY AND COUNT()
SELECT Code,
n,
Count(*) AS Duplicate_Count
FROM dbo.TestTable
GROUP BY Code,
n
HAVING Count(*) > 1
GO
Code n Duplicate_Count
C 1 2
A 2 2
Step 3 Display the Duplicate Records using PARTITION BY
-- Create a DuplicateCount by Partition over the Code, n fields
-- Duplicates will have a DuplicateCount > 1.
-- The 2 Rows as shown in the previous query
-- To use with your tables add all the existing columns to the PARTITION BY clause
WITH CTE (Code, n, Duplicate_Count)
AS
(
SELECT Code, n,
ROW_NUMBER() OVER(PARTITION BY Code, n
ORDER BY Code) AS Duplicate_Count
FROM dbo.TestTable
)
SELECT *
FROM CTE
ORDER BY Code, n, Duplicate_Count
GO
Code n Duplicate_Count
A 1 1
A 2 1
A 2 2
B 1 1
B 2 1
B 3 1
C 1 1
C 1 2
C 2 1
Step 5 Delete the duplicate rows and return the remaining data
-- Delete Duplicate records
WITH cte (Code, n, Duplicate_Count)
AS (SELECT code,
n,
ROW_NUMBER()
OVER(
PARTITION BY Code, n
ORDER BY Code) AS Duplicate_Count
FROM dbo.TestTable)
DELETE FROM cte
WHERE Duplicate_Count > 1
GO
-- Select remaining records with no duplicates
SELECT Code,
n
FROM dbo.TestTable
ORDER BY Code,
n
GO
Code n
A 1
A 2
B 1
B 2
B 3
C 1
C 2
Step 6 Double check the record count after the delete
I always do a record count after a delete as a sanity check. I expect 7 records to remain after I remove the 2 duplicates.
SELECT COUNT(*) FROM dbo.TestTable
GO
7
A 30+ year database veteran. I enjoy sharing my stories and knowledge with others.