How To Delete Duplicate Rows in MySQL

 

[1]. Prepare sample data

The following script creates table contacts and inserts sample data into the contacts table for the demonstration.

DROP TABLE IF EXISTS contacts;

CREATE TABLE contacts (
id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(255) NOT NULL
);

INSERT INTO contacts (first_name,last_name,email)
VALUES ('Carine ','Schmitt','carine.schmitt@verizon.net'),
('Jean','King','jean.king@me.com'),
('Peter','Ferguson','peter.ferguson@google.com'),
('Janine ','Labrune','janine.labrune@aol.com'),
('Jonas ','Bergulfsen','jonas.bergulfsen@mac.com'),
('Janine ','Labrune','janine.labrune@aol.com'),
('Susan','Nelson','susan.nelson@comcast.net'),
('Zbyszek ','Piestrzeniewicz','zbyszek.piestrzeniewicz@att.net'),
('Roland','Keitel','roland.keitel@yahoo.com'),
('Julie','Murphy','julie.murphy@yahoo.com'),
('Kwai','Lee','kwai.lee@google.com'),
('Jean','King','jean.king@me.com'),
('Susan','Nelson','susan.nelson@comcast.net'),
('Roland','Keitel','roland.keitel@yahoo.com');

Note that you can execute this script to recreate test data after you execute a DELETE statement.

This query returns data from the contacts table:

SELECT * FROM contacts
ORDER BY email;

The following query returns the duplicate emails in the contacts table:

SELECT
email, COUNT(email)
FROM
contacts
GROUP BY
email
HAVING
COUNT(email) > 1;
Delete duplicate rows in MySQL

As you can see, we have four rows with duplicate emails.

[2].  Delete duplicate rows using DELETE JOIN statement

MySQL provides you with the DELETE JOIN statement that allows you to remove duplicate rows quickly.

The following statement deletes duplicate rows and keeps the highest id:

DELETE t1 FROM contacts t1
INNER JOIN contacts t2
WHERE
t1.id < t2.id AND
t1.email = t2.email;

This query references the contacts table twice, therefore, it uses the table alias t1 and t2.

The output is:

Query OK, 4 rows affected (0.10 sec)

It indicated that four rows have been deleted. You can execute the query that find duplicate emails again to verify the delete:

SELECT
email,
COUNT(email)
FROM
contacts
GROUP BY
email
HAVING
COUNT(email) > 1;

The query returns an empty set, which means that the duplicate rows have been deleted.

Let’s verify data from the contacts table:

SELECT * FROM contacts;
MySQL delete duplicate rows - DELETE JOIN keeps Highest ID

The rows with id 2, 4, 7, and 9 have been deleted.

Comments

Popular posts from this blog

Java : Variables Declaring

Install DNF in RHEL/CentOS 7

SQL Self JOIN