I had to do this recently on Oracle, but the steps would have been the same on MySQL. It was a lot of data, at least compared to what I'm used to working with, so my process to de-dup was comparatively heavyweight. I'm including it here in case someone else comes along with a similar problem.
My duplicate records had different IDs, different
updated_at times, possibly different
updated_by IDs, but all other columns the same. I wanted to keep the most recently updated of any duplicate set.
I used a combination of Rails logic and SQL to get it done.
Step one: run a rake script to identify the IDs of the duplicate records, using model logic. IDs go in a text file.
Step two: create a temporary table with one column, the IDs to delete, loaded from the text file.
Step three: create another temporary table with all the records I'm going to delete (just in case!).
CREATE TABLE temp_duplicate_models
AS (SELECT * FROM models
WHERE id IN (SELECT * FROM temp_duplicate_ids));
Step four: actual deleting.
DELETE FROM models WHERE id IN (SELECT * FROM temp_duplicate_ids);