MySQL subquery Copying to tmp table hangs forever

When there is subquery, it can be exetremly slow even hangs forever in MySQL. Here is an example.

 
SELECT COUNT(*) FROM `stat` WHERE `ip` IN (SELECT ip  FROM `stat`   GROUP BY `ip` HAVING COUNT(ip) > 1000 ORDER BY COUNT(ip) DESC)
 

This query first collects the ip addresses have 1000 or more visiting records come from them, and then query the total number of visiting records come from all those IP addresses.

I thought this query will run for a few seconds, but the prompt just hang there after the query is issued. The show processlist shows that the query is stuck at "Copying to tmp table" state.

 
+----+------+----------------+----------+---------+-------+----------------------+
| Id | User | Host           | db       | Command | Time  | State                |
+----+------+----------------+----------+---------+-------+----------------------+
|  3 | root | localhost:4158 | database | Query   | 69403 | Copying to tmp table |
|  6 | root | localhost:7595 | database | Query   |     0 | NULL                 |
+----+------+----------------+----------+---------+-------+----------------------+
 

I want to know how long it will takes to complete this query so I let the MySQL client keep running, the next day the show processlist command shows the query has been running for 19 hours. Considering the total number of rows in the table is less than 300,000, it's safe to say the query isn't making any progress. Obviously, it's not just slow, it hangs indefinitely.

If we break the query into two separate queries. The subquery takes 5.18s and returns 27 rows of IP address. Then we manually concatenate those IP addresses and copy to the in clause of the outer query. The second query takes 0.27s.

 
 
mysql> select ip  FROM `stat`   group by `ip` having count(ip) > 1000 order by count(ip) desc;
+-----------------+
| ip              |
+-----------------+
| 112.124.0.114   |
| 118.31.120.31   |
| 39.101.185.229  |
| 120.26.50.66    |
| 47.99.196.234   |
| 121.40.190.236  |
| 198.143.187.202 |
| 117.41.183.82   |
| 125.64.94.206   |
| 39.103.142.195  |
| 39.101.184.55   |
| 121.42.142.188  |
| 42.62.37.103    |
| 162.241.24.182  |
| 222.42.21.228   |
| 31.128.252.4    |
| 121.196.99.99   |
| 45.227.255.149  |
| 125.64.94.221   |
| 47.113.87.53    |
| 183.3.161.252   |
| 39.101.205.97   |
| 104.148.87.124  |
| 37.187.56.66    |
| 46.105.98.166   |
| 142.4.215.116   |
| 218.77.94.78    |
+-----------------+
27 rows in set (5.18 sec)
 
SELECT count(*) FROM `stat` where `ip` in ('112.124.0.114','118.31.120.31','39.101.185.229', '120.26.50.66', '47.99.196.234', '121.40.190.236', '198.143.187.202', '117.41.183.82', '125.64.94.206', '39.103.142.195', '39.101.184.55', '121.42.142.188', '42.62.37.103', '162.241.24.182', '222.42.21.228', '31.128.252.4', '121.196.99.99', '45.227.255.149', '125.64.94.221', '47.113.87.53', '183.3.161.252', '39.101.205.97', '104.148.87.124', '37.187.56.66', '46.105.98.166', '142.4.215.116', '218.77.94.78')
 
+----------+
| count(*) |
+----------+
|    43387 |
+----------+
1 row in set (0.27 sec)
 

What happend?

Execute EXPLAIN on the query tells us this:

 
+----+--------------------+-------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type        | table | type | possible_keys | key  | key_len | ref  | rows   | Extra                           |
+----+--------------------+-------+------+---------------+------+---------+------+--------+---------------------------------+
|  1 | PRIMARY            | stat  | ALL  | NULL          | NULL | NULL    | NULL | 287789 | Using where                     |
|  2 | DEPENDENT SUBQUERY | stat  | ALL  | NULL          | NULL | NULL    | NULL | 287789 | Using temporary; Using filesort |
+----+--------------------+-------+------+---------------+------+---------+------+--------+---------------------------------+
 

The most important warning sign is the DEPENDENT SUBQUERY. It means the query will be executed outside in, that is, for each row in the outer query the subquery will be executed once. This is not most people expected by just looking at the query. We would think the subquery will be executed first which generate a list of ip addresses, then the out query will check if the ip of each row exists in the list, even the outer join is doing a full table scan, this will not take too long to finish. What MySQL actually did is opposite, because the out query and subquery are deemed as correlated subquery which means the subquery has references to the outer query, in this case, the same table. Thus the subquery can't be independent from outer query. You can think of the subquery as a callback function which need a parameter from each row of out query, whenever the out query retrieve a row, it invokes the subquery and pass a parameter to it. This also explains why the result of show processlist; always stuck at "Copying to tmp table".

As showed above, every subquery takes 5+ seconds to complete, so the total time will be 300,000 * 5 = 416 hours to complete.

One optimization is using derived table, the idea is to create a derived table to store the results of subquery, then the out query will works on the derived table. The whole query will still be DEPENDENT SUBQUERY, but the subquery run only once.

 
mysql> SELECT COUNT(*) FROM `stat` WHERE `ip` IN (SELECT `ip` FROM (SELECT ip  FROM `stat` GROUP BY `ip` HAVING COUNT(ip) > 1000 ) AS temp);
+----------+
| COUNT(*) |
+----------+
|    43387 |
+----------+
1 ROW IN SET (4.59 sec)