早上同事要我写个MySQL去除重复数据的SQL,想起来上次写过一篇MySQL去除重复数据的博客,使用导入导出加唯一索引实现的,但是那种方式对业务影响较大,所以重新写一个存储过程来删重复数据,这一写就写了一个上午,这种BUG确实是很令人沮丧和浪费时间的。

CREATE PROCEDURE test_procedure ()

MySQL游标循环示例

/*我们有时候会遇到需要对 从A表查询的结果集S_S 的记录 进行遍历并做一些操作(如插入),且这些操作需要的数据或许部分来自S_S集合*/
/*临时存储过程,没办法,不能直接在查询窗口做这些事。*/
drop procedure if exists proc_tmp;
create procedure proc_tmp()
BEGIN
/*这种写法也可以:DECLARE done INT DEFAULT FALSE;*/
declare done int default 0;  /*用于判断是否结束循环*/
declare hostId bigint; /*用于存储结果集S_S的记录(因为我这里S_S的记录只有一列且为bigint类型)*/

/*定义游标*/
declare idCur cursor for select A.HostId from dev_host as A, sys_hostconfig as B where A.HostId != B.HostId;
/*定义 设置循环结束标识done值怎么改变 的逻辑*/
declare continue handler for not FOUND set done = 1; /*done = true;亦可*/

open idCur;  /*打开游标*/

/* 循环开始 */
REPEAT
/* 如果要fetch多列应该这样写,fetch cur/*对应下面的idCur*/ into rowId, rowName;但是注意rowId和rowName要先declare,且declare cur时也要select两列,且这两列和rowId、rowName对应 */
fetch idCur into hostId;  /*这部分可以看看书,还可以fetch多列(假设结果集S_S的记录不是单列的话)*/
if not done THEN  /*数值为非0,MySQL认为是true*/
insert into sys_hostconfig(HostId, AElecCap, BElecCap, CElecCap, RemElecCap, ATmpCap, BTmpCap, CTmpCap, BoxTmpCap, CreateTime)
/*注意这里用到了hostId和now()*/
values(hostId, 80, 80, 80, 500, 80, 80, 80, 80, NOW());
end if;
until done end repeat;

close idCur;  /*关闭游标*/
END
/* 循环结束 */

call proc_tmp();
drop procedure proc_tmp;  /*删除临时存储过程*/

这里把流程简单的描述一下,删重复数据的逻辑很简单:

BEGIN

 

 

1.根据重复判断条件找出重复记录的最小主键(一般是ID列)。

— 需要定义接收游标数据的变量

如果用in语句,感觉很慢,跑了好久都没有结果,用存储过程删除可以把握

2.在符合重复条件的记录中,把主键大于最小主键的记录全部删掉即可。

DECLARE a CHAR(100);

进度,这个是MySQL游标循环的示例代码,避免以后经常找,记录上来。

假设我有如下表,需要删除start_time和end_time都一样的重复记录。

— 遍历数据结束标志

 www.2cto.com  

图片 1

DECLARE done INT DEFAULT FALSE;

CREATE DEFINER=`root`@`%` PROCEDURE `bi`.`clean_data`()

那么存储过程如下:

— 游标

    READS SQL DATA

DELIMITER //
DROP PROCEDURE IF EXISTS Del_Dup_FOR_TEST;
CREATE PROCEDURE Del_Dup_FOR_TEST()
BEGIN
DECLARE min_id INT;
DECLARE v_start_time,v_end_time DATETIME;
DECLARE v_count INT;
DECLARE done INT DEFAULT 0;
DECLARE my_cur CURSOR FOR SELECT start_time,end_time,min(id),count(1) AS count FROM leo.test GROUP BY start_time,end_time HAVING count>1;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN my_cur;
  myloop: LOOP
  FETCH my_cur INTO v_start_time,v_end_time,min_id,v_count;
  IF done=1 THEN
  LEAVE myloop;
  END IF;
  DELETE FROM leo.test WHERE start_time=v_start_time AND end_time=v_end_time AND id>min_id;
  COMMIT;
  END LOOP myloop;
CLOSE my_cur;
END;
//
DELIMITER ;

DECLARE cur CURSOR FOR SELECT suffix from tb_name;

BEGIN

逻辑很清晰,就是根据重复判断条件依次删掉重复组中主键大于最小主键的记录们。

— 将结束标志绑定到游标

  declare v_imei varchar(128);

但是在编写过程中却遇到一个很恶心的BUG,我最初的内容是这么写的:

DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

    declare stop int default 0;

DELIMITER //
DROP PROCEDURE IF EXISTS Del_Dup_FOR_TEST;
CREATE PROCEDURE Del_Dup_FOR_TEST()
BEGIN
DECLARE min_id INT;
DECLARE start_time,end_time DATETIME;
DECLARE count INT;
DECLARE done INT DEFAULT 0;
DECLARE my_cur CURSOR FOR SELECT start_time,end_time,min(id),count(1) AS count FROM leo.test GROUP BY start_time,end_time HAVING count>1;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN my_cur;
  myloop: LOOP
  FETCH my_cur INTO start_time,end_time,min_id,count;
  IF done=1 THEN
  LEAVE myloop;
  END IF;
  DELETE FROM leo.test WHERE start_time=start_time AND end_time=end_time AND id>min_id;
  COMMIT;
  END LOOP myloop;
CLOSE my_cur;
END;
//
DELIMITER ;

— 打开游标

    declare cur cursor for (select imei from bi.tmp_imei);

不同的部分在于变量定义的名称,即:

OPEN cur;

    declare CONTINUE HANDLER FOR SQLSTATE ‘02000’ SET stop=1;

FETCH
INTO的变量名绝对不能是你定义CURSOR时SQL语句查出来的列名或者列别名,也就说你定义的变量名既不能是表中已经存在的列名,也不能是你定义游标时用过的别名(如本例中的count),只要一个条件不符合,FETCH
INTO就把全部的变量赋NULL值,这点你可以尝试在FETCH
INTO后加一句Select打印变量名验证。

— 开始循环

    OPEN cur;

在查询到这个BUG之前去官网页面特地看了一下是否是我的语法有错误: ,确信语法没问题,但倒数第二条评论显示可能是列名的隐藏BUG,最后一条评论反驳了BUG说法,但没有办法我还是根据BUG
REPORT做了以上修改,然后功能就正常了。

read_loop: LOOP

    FETCH cur INTO v_imei;

关于此BUG的BUG报告页面详见MySQL BUG:#28227 和
BUG:#5967

— 提取游标里的数据,这里只有一个,多个的话也一样;

    WHILE stop <> 1 DO

那么再回头看一下官网文档下的最后一条评论,开始我认为最后一条反驳BUG的评论完全是扯淡,是哪个傻X说这不是个BUG的?后来仔细想了想,他俩都对,这确实也算个BUG,傻X的也是我。

FETCH cur INTO a;

        delete from bi.tmp_dt_fee_user_info where imei=v_imei;

贴一下页面下最后两条评论(截止2018.08.01):

— 声明结束的时候

        FETCH cur INTO v_imei;

Posted by Brent Roady on May 9, 2012
It should be noted that the local variable names used in FETCH [cursor] INTO must be different than the variable names used in the SELECT statement 
defining the CURSOR. Otherwise the values will be NULL. 
In this example, 
DECLARE a VARCHAR(255);
DECLARE cur1 CURSOR FOR SELECT a FROM table1;
FETCH cur1 INTO a;
the value of a after the FETCH will be NULL.
This is also described here: http://bugs.mysql.com/bug.php?id=28227

Posted by Jérémi Lassausaie on February 3, 2015
Answer for Brent Roady :
I don't see any bug in the bahaviour described.
DECLARE a VARCHAR(255);
/* you declare a variable "a" without a specified default value, a=NULL */
DECLARE cur1 CURSOR FOR 
SELECT a FROM table1;
/* You declare a cursor that selects "a" FROM a table */
OPEN cur1;
/* You execute your cursor query, a warning is raised because a is ambiguously defined but you don't see it */
FETCH cur1 INTO a;
/* you put your unique field in your unique row into a (basically you do "SET a=a;") so a is still NULL */
There is no bug report, just a misunderstanding.

IF done THEN

    END WHILE;

Brent遇到的现象与我相同,并列出了BUG
Report的链接。

LEAVE read_loop;

    CLOSE cur;

Jeremi(猜测可能是个程序员)回答,这是一个显而易见的误解,当你声明了变量a(初始值为NULL),然后FETCH
INTO a就相当于set a=a,在任何程序语言中这都是无解的。

END IF;

END

因此在编写存储过程中为定义的变量加个前缀标识是很好的习惯,想起以前Oracle写存储过程确实都加v_前缀,SQL
Server 都用@前缀,现在轮到mysql却忽略了,确实需要牢记下。

— 这里做你想做的循环的事件

 

DELETE from mail_email where email like a;

如果用in语句,感觉很慢,跑了好久都没有结果,用存储过程删除可以把握
进度,这个是MySQL游标循环的示例代码,避免…

END LOOP;

— 关闭游标

CLOSE cur;

END

发表评论

电子邮件地址不会被公开。 必填项已用*标注

网站地图xml地图