How To Delete Duplicate Rows in MySQL

September 24, 2020

[1]. Prepare sample data

The following script creates table contacts and inserts sample data into the contacts table for the demonstration.


DROP TABLE IF EXISTS contacts;

CREATE TABLE contacts (
    id INT PRIMARY KEY AUTO_INCREMENT,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL, 
    email VARCHAR(255) NOT NULL
);

INSERT INTO contacts (first_name,last_name,email) 
VALUES ('Carine ','Schmitt','carine.schmitt@verizon.net'),
       ('Jean','King','jean.king@me.com'),
       ('Peter','Ferguson','peter.ferguson@google.com'),
       ('Janine ','Labrune','janine.labrune@aol.com'),
       ('Jonas ','Bergulfsen','jonas.bergulfsen@mac.com'),
       ('Janine ','Labrune','janine.labrune@aol.com'),
       ('Susan','Nelson','susan.nelson@comcast.net'),
       ('Zbyszek ','Piestrzeniewicz','zbyszek.piestrzeniewicz@att.net'),
       ('Roland','Keitel','roland.keitel@yahoo.com'),
       ('Julie','Murphy','julie.murphy@yahoo.com'),
       ('Kwai','Lee','kwai.lee@google.com'),
       ('Jean','King','jean.king@me.com'),
       ('Susan','Nelson','susan.nelson@comcast.net'),
       ('Roland','Keitel','roland.keitel@yahoo.com');

Note that you can execute this script to recreate test data after you execute a DELETE statement.

This query returns data from the contacts table:


SELECT * FROM contacts
ORDER BY email;

The following query returns the duplicate emails in the contacts table:


SELECT 
    email, COUNT(email)
FROM
    contacts
GROUP BY 
    email
HAVING 
    COUNT(email) > 1;

As you can see, we have four rows with duplicate emails.

[2]. Delete duplicate rows using `DELETE JOIN` statement

MySQL provides you with the DELETE JOIN statement that allows you to remove duplicate rows quickly.

The following statement deletes duplicate rows and keeps the highest id:


DELETE t1 FROM contacts t1
INNER JOIN contacts t2 
WHERE 
    t1.id < t2.id AND 
    t1.email = t2.email;

This query references the contacts table twice, therefore, it uses the table alias t1 and t2.

The output is:


Query OK, 4 rows affected (0.10 sec)

It indicated that four rows have been deleted. You can execute the query that find duplicate emails again to verify the delete:


SELECT 
    email, 
    COUNT(email)
FROM
    contacts
GROUP BY 
    email
HAVING 
    COUNT(email) > 1;

The query returns an empty set, which means that the duplicate rows have been deleted.

Let’s verify data from the contacts table:


SELECT * FROM contacts;

MySQL delete duplicate rows - DELETE JOIN keeps Highest ID

The rows with id 2, 4, 7, and 9 have been deleted.

Search This Blog

Miu Tech