Remove Duplicate Rows From SQL Server Table Using PARTITION BY and CTE Query

Often you’ll run into cases where a SQL Server table will contain duplicate rows and you’re tasked with removing them. Here’s one way to do it with a PARTITION BY.

I blame the front-end developers when this happens, even if I’m the sole developer of the app. 😊

In the following SQL scripts, we will create a test table with some duplicate records and then we’ll remove them with a CTE (Common Table Expression) and a PARTITION BY window function.

I was kinda surprised that a delete from a CTE would remove records from the underlying table, but it does. You’ll need at least SQL Server 2008+ for this to work.

As with any delete operations on production data, make sure that you have a backup before proceeding.

Step 1 Create the test table

-- Create the test table

CREATE TABLE dbo.TestTable(
Code Varchar (1) NULL,
n smallint NULL
) ON [PRIMARY]
go

-- Insert records with 2 duplicate rows

INSERT INTO dbo.TestTable
VALUES
  ('A',1),
  ('A',2),
  ('A',2),
  ('B',1),
  ('B',2),
  ('B',3),
  ('C',1),
  ('C',1),
  ('C',2)
GO

SELECT Code, n FROM dbo.TestTable ORDER BY Code, n
go

Code	n
A	1
A	2
A	2
B	1
B	2
B	3
C	1
C	1
C	2

Step 2 Check for duplicate records

-- Display any duplicate rows using GROUP BY AND COUNT()

SELECT Code,
       n,
       Count(*) AS Duplicate_Count
FROM   dbo.TestTable
GROUP  BY Code,
          n
HAVING Count(*) > 1
GO

Code	n	Duplicate_Count
C	1	2
A	2	2

Step 3 Display the Duplicate Records using PARTITION BY


-- Create a DuplicateCount by Partition over the Code, n fields
-- Duplicates will have a DuplicateCount > 1. 
-- The 2 Rows as shown in the previous query
-- To use with your tables add all the existing columns to the PARTITION BY clause

WITH CTE (Code, n, Duplicate_Count)
AS
(
  SELECT Code, n,
  ROW_NUMBER() OVER(PARTITION BY Code, n
   ORDER BY Code) AS Duplicate_Count
  FROM dbo.TestTable
)
SELECT *
FROM CTE
ORDER BY Code, n, Duplicate_Count
GO

Code	n	Duplicate_Count
A	1	1
A	2	1
A	2	2
B	1	1
B	2	1
B	3	1
C	1	1
C	1	2
C	2	1

Step 5 Delete the duplicate rows and return the remaining data

-- Delete Duplicate records
WITH cte (Code, n, Duplicate_Count)
     AS (SELECT code,
                n,
                ROW_NUMBER()
                  OVER(
                    PARTITION BY Code, n
                    ORDER BY Code) AS Duplicate_Count
         FROM   dbo.TestTable)
         
DELETE FROM cte
WHERE  Duplicate_Count > 1

GO

-- Select remaining records with no duplicates

SELECT Code,
       n
FROM   dbo.TestTable
ORDER  BY Code,
          n

GO

Code	n
A	1
A	2
B	1
B	2
B	3
C	1
C	2

Step 6 Double check the record count after the delete

I always do a record count after a delete as a sanity check. I expect 7 records to remain after I remove the 2 duplicates.

SELECT COUNT(*) FROM dbo.TestTable
GO

7

Rob Mills

A 30+ year database veteran. I enjoy sharing my stories and knowledge with others.