limit不参与SQL成本计算致索引失效

祈雨的笔记

2020-09-10

mysql

描述

测试环境对某应用进行压测时，QPS突然陡降，监控显示mysql的CPU荷载跑满且有大量的慢查询。于是查看慢查询sql，发现该sql理论上不应该出现慢查询。

简化后的SQL：select * from test_table where a = 0 and b = 0 limit 1在a字段和b字段已经建立了联合索引后，mysql执行该sql时却选择了其他的错误索引导致了慢sql。

复现

CREATE TABLE `test_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `a` int(11) NOT NULL,
  `b` int(11) NOT NULL,
  `content` varchar(32) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `a_index` (`a`) USING BTREE,
  KEY `b_a_index` (`b`,`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

delimiter  //
CREATE PROCEDURE init_test_table() 
BEGIN 
	DECLARE index_id INT;
	SET index_id = 0;
	START TRANSACTION; 
	WHILE index_id < 2000000 DO
		insert into test_table(a,b,content) values (0,0,CONCAT('content',index_id));
		SET index_id = index_id + 1;
	END WHILE;
	COMMIT; 
END //
delimiter ;
call init_test_table();

update test_table set b = 1 where id < 300000;
update test_table set a = 1 where id > 1700000;

该测试表一共填充了200W条记录，其中前30W条记录的b=1，后30W条记录的a=1，数据分布情况如下图：

1	select * from test_table where a = 0 and b = 0 limit 1;

该sql当前最佳使用的索引是b_a_index，mysql使用该索引只需要通过索引找到一条符合查询条件的记录ID后，再对ID做一次回表操作即可。但实际上，如下，mysql使用了错误的索引a_index，导致sql在执行where查询条件时发生了30万条记录的回表，造成了慢查询。

在对sql使用强制索引后，对比发现，a_index索引下sql的平均耗时在450毫秒左右，而b_a_index索引下sql的平均耗时仅需要0.5毫秒左右，两者相差近千倍的耗时。这一对比，更加证实mysql的执行计划选择了错误的索引导致sql的执行效率下降。

mysql> explain select * from test_table where a = 0 and b = 0 limit 1;
+----+-------------+------------+------------+------+-------------------+---------+---------+-------+--------+----------+-------------+
| id | select_type | table      | partitions | type | possible_keys     | key     | key_len | ref   | rows   | filtered | Extra       |
+----+-------------+------------+------------+------+-------------------+---------+---------+-------+--------+----------+-------------+
|  1 | SIMPLE      | test_table | NULL       | ref  | a_index,b_a_index | a_index | 4       | const | 997354 |     8.08 | Using where |
+----+-------------+------------+------------+------+-------------------+---------+---------+-------+--------+----------+-------------+

mysql> show profiles;
+----------+------------+-------------------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                         |
+----------+------------+-------------------------------------------------------------------------------+
|       49 | 0.44619325 | select * from test_table where a = 0 and b = 0 limit 1                        |
|       50 |   0.466321 | select * from test_table where a = 0 and b = 0 limit 1                        |
|       51 | 0.45312325 | select * from test_table where a = 0 and b = 0 limit 1                        |
|       52 |  0.4334895 | select * from test_table where a = 0 and b = 0 limit 1                        |
|       53 | 0.48915875 | select * from test_table where a = 0 and b = 0 limit 1                        |
|       54 |   0.000488 | select * from test_table force index(b_a_index) where a = 0 and b = 0 limit 1 |
|       55 |  0.0005185 | select * from test_table force index(b_a_index) where a = 0 and b = 0 limit 1 |
|       56 |  0.0006425 | select * from test_table force index(b_a_index) where a = 0 and b = 0 limit 1 |
|       57 | 0.00053575 | select * from test_table force index(b_a_index) where a = 0 and b = 0 limit 1 |
|       58 | 0.00069475 | select * from test_table force index(b_a_index) where a = 0 and b = 0 limit 1 |
+----------+------------+-------------------------------------------------------------------------------+

原因

"range_scan_alternatives": [
    {
        "index": "a_index",
        "ranges": [
            "0 <= a <= 0"
        ],
        "index_dives_for_eq_ranges": true,
        "rowid_ordered": true,
        "using_mrr": false,
        "index_only": false,
        "rows": 997354,
        "cost": 1200000,
        "chosen": true
    },
    {
        "index": "b_a_index",
        "ranges": [
            "0 <= b <= 0 AND 0 <= a <= 0"
        ],
        "index_dives_for_eq_ranges": true,
        "rowid_ordered": true,
        "using_mrr": false,
        "index_only": false,
        "rows": 997354,
        "cost": 1200000,
        "chosen": false,
        "cause": "cost"
    }
]

通过mysql的optimizer tracing输出该sql的执行计划，发现执行计划中mysql估算出a_index和b_a_index两个的索引的执行成本相同，输出如上。因此mysql选择了第一个索引，也就是a_index。

对于a_index索引，由于索引字段只有a，不能满足查询条件的a和b两个查询条件使用覆盖索引条件，因此该索引需要在where查询时就需要对数据进行回表操作。见下图数据分布，where a = 0 and b = 0的查询条件下，由于前30万条数据都是b=1，因此前30万条记录均不符合查询条件而导致30万条记录的回表操作，产生了30万次的回表性能浪费。

而对于b_a_index索引，该索引字段有a和b，对于where a = 0 and b = 0的查询条件下不需要做回表操作，直接通过innodb的B+Tree索引结构快速获取到一个符合查询条件的记录ID后，仅对该一条记录做回表操作即完成sql执行。因此性能消耗远远低于a_index索引。

之所以mysql的执行计划估算两个索引的执行成本相同，是因为mysql在评估执行成本时不考虑limit条件，即参与执行计划计算的sql是select * from test_table where a = 0 and b = 0。对于select * from test_table where a = 0 and b = 0的sql执行，a_index和b_a_index两个的索引的执行成本相同。虽然不知道innodb引擎在设计时为什么在执行计划过程中会不考虑limit条件，但这一现象，确实导致了当前场景下的索引失效问题。

解决

强制索引
拆分查询，将原sql拆成如下的右连接查询

1
2
3

select * from test_table where a = 0 and b = 0 limit 1;
select * from test_table force index(b_a_index) where a = 0 and b = 0 limit 1
select * from test_table t1 right join (select id from test_table where a = 0 and b = 0 limit 1) t2 on t1.id = t2.id;

mysql> explain select * from test_table where a = 0 and b = 0 limit 1;
+----+-------------+------------+------------+------+-------------------+---------+---------+-------+--------+----------+-------------+
| id | select_type | table      | partitions | type | possible_keys     | key     | key_len | ref   | rows   | filtered | Extra       |
+----+-------------+------------+------------+------+-------------------+---------+---------+-------+--------+----------+-------------+
|  1 | SIMPLE      | test_table | NULL       | ref  | a_index,b_a_index | a_index | 4       | const | 997354 |     8.08 | Using where |
+----+-------------+------------+------------+------+-------------------+---------+---------+-------+--------+----------+-------------+

mysql> explain select t1.* from test_table t1 right join (select id from test_table where a = 0 and b = 0 limit 1) t2 on t1.id = t2.id;
+----+-------------+------------+------------+--------+-------------------+-----------+---------+-------------+--------+----------+-------------+
| id | select_type | table      | partitions | type   | possible_keys     | key       | key_len | ref         | rows   | filtered | Extra       |
+----+-------------+------------+------------+--------+-------------------+-----------+---------+-------------+--------+----------+-------------+
|  1 | PRIMARY     | <derived2> | NULL       | system | NULL              | NULL      | NULL    | NULL        |      1 |      100 | NULL        |
|  1 | PRIMARY     | t1         | NULL       | const  | PRIMARY           | PRIMARY   | 4       | const       |      1 |      100 | NULL        |
|  2 | DERIVED     | test_table | NULL       | ref    | a_index,b_a_index | b_a_index | 8       | const,const | 997354 |      100 | Using index |
+----+-------------+------------+------------+--------+-------------------+-----------+---------+-------------+--------+----------+-------------+