再战mysql 数据去重

年初时,写过一篇去重的,在小表中还能用用,在大表中真的是效率低下,现在给了一次优化
https://www.cnblogs.com/jarjune/p/8328013.html

继上一篇文章

方法三:

DELIMITER //

DROP PROCEDURE IF EXISTS delete_rows_2;

CREATE PROCEDURE delete_rows_2(IN TABLENAME VARCHAR(50), IN FIELDNAMES VARCHAR(100), IN AUTOFIELD VARCHAR(50))
BEGIN

DECLARE DELETE_TABLE_ROWS_SQL VARCHAR(1000);

SET DELETE_TABLE_ROWS_SQL = CONCAT('
		DELETE 
		FROM 
			', TABLENAME ,' 
		WHERE 
			(', FIELDNAMES ,') IN ( 
				SELECT ', FIELDNAMES ,' 
				FROM (
					SELECT 
						', FIELDNAMES ,' 
					FROM 
						', TABLENAME ,' 
					GROUP BY 
						', FIELDNAMES ,' 
					HAVING 
						COUNT(1) > 1 
				) t1
			) 
		AND ', AUTOFIELD ,' NOT IN ( 
			SELECT ', AUTOFIELD ,' 
			FROM (
				SELECT 
					MAX(', AUTOFIELD ,') ', AUTOFIELD ,' 
				FROM 
					', TABLENAME ,'
				GROUP BY 
					', FIELDNAMES ,' 
				HAVING 
					COUNT(1) > 1 
				) t2
			)
');

SET @DELETE_TABLE_ROWS_SQL = DELETE_TABLE_ROWS_SQL;

PREPARE DELETE_TABLE_ROWS_SQL_PRE FROM @DELETE_TABLE_ROWS_SQL;
EXECUTE DELETE_TABLE_ROWS_SQL_PRE;

END//

DELIMITER ;

CALL delete_rows_1('表名', '字段1,字段2,字段3...', '主键(唯一)字段');

之后发现删除的效率还是挺低,又优化成

方法三(优化):

DELIMITER //

DROP PROCEDURE IF EXISTS delete_rows_2;

CREATE PROCEDURE delete_rows_2(IN TABLENAME VARCHAR(50), IN FIELDNAMES VARCHAR(100), IN AUTOFIELD VARCHAR(50))
BEGIN

DECLARE DELETE_TABLE_ROWS_SQL VARCHAR(1000);

SET DELETE_TABLE_ROWS_SQL = CONCAT('
		DELETE 
		FROM 
			', TABLENAME ,' 
		WHERE 
			', AUTOFIELD ,' IN ( 
				SELECT 
					', AUTOFIELD ,' 
				FROM
					(
					SELECT 
						', AUTOFIELD ,' 
					FROM 
						', TABLENAME ,' 
					WHERE 
						(', FIELDNAMES ,') IN ( 
							SELECT 
								', FIELDNAMES ,' 
							FROM 
								', TABLENAME ,' 
							GROUP BY 
								', FIELDNAMES ,' 
							HAVING 
								COUNT(1) > 1 
						) 
					AND ', AUTOFIELD ,' NOT IN ( 
						SELECT 
							MAX(', AUTOFIELD ,') 
						FROM 
							', TABLENAME ,'
						GROUP BY 
							', FIELDNAMES ,' 
						HAVING 
							COUNT(1) > 1 
					) 
				) t2 
			) 
	');

SET @DELETE_TABLE_ROWS_SQL = DELETE_TABLE_ROWS_SQL;

PREPARE DELETE_TABLE_ROWS_SQL_PRE FROM @DELETE_TABLE_ROWS_SQL;
EXECUTE DELETE_TABLE_ROWS_SQL_PRE;

END//

DELIMITER ;

CALL delete_rows_2('表名', '字段1,字段2,字段3...', '主键字段');

由于上述都要group by 两次,又换了一种思路

方法四

DELETE t1
FROM
	l_weijij_47 t1,
	(
		SELECT
			f01,
			f02,
			f03,
			MAX(seq_value) seq_value
		FROM
			l_weijij_47
		GROUP BY
			f01,
			f02,
			f03
		HAVING
			COUNT(1) > 1
		ORDER BY NULL
	) t2
where
	t1.f01 = t2.f01
AND t1.f02 = t2.f02
AND t1.f03 = t2.f03
and t1.seq_value < t2.seq_value

注:group by默认会进行排序,所以要加上order by NULL就避免了排序
group by a,b,c的时候,a,b,c一定要加索引才快

综上,方法四是目前在用的去重。

原文地址:https://www.cnblogs.com/jarjune/p/9951573.html