8.2.1.6 嵌套循环连接算法_MySQL 5.7 参考手册

MySQL 8.0 参考手册 / 第8章优化 / 8.2 优化SQL语句 / 8.2.1 优化 SELECT 语句 / 8.2.1.6 嵌套循环连接算法

8.2.1.6 嵌套循环连接算法

MySQL 使用嵌套循环算法或其变体在表之间执行连接。

嵌套循环连接算法

一个简单的嵌套循环连接 (NLJ) 算法从循环中的第一个表中一次读取一个行，将每一行传递给处理连接中下一个表的嵌套循环。重复此过程的次数与要连接的表一样多。

假设要使用以下连接类型执行三个表t1、t2和之间的连接：t3

Table   Join Type
t1      range
t2      ref
t3      ALL

如果使用简单的 NLJ 算法，连接处理如下：

for each row in t1 matching range {
  for each row in t2 matching reference key {
    for each row in t3 {
      if row satisfies join conditions, send to client
    }
  }
}

由于 NLJ 算法一次将一行从外循环传递到内循环，因此它通常会多次读取内循环中处理的表。

块嵌套循环连接算法

块嵌套循环 (BNL) 连接算法使用缓冲在外部循环中读取的行来减少必须读取内部循环中的表的次数。例如，如果将 10 行读入缓冲区并将缓冲区传递到下一个内部循环，则可以将内部循环中读取的每一行与缓冲区中的所有 10 行进行比较。这将必须读取内表的次数减少了一个数量级。

MySQL 连接缓冲具有以下特点：

Join buffering can be used when the join is of type ALL or index (in other words, when no possible keys can be used, and a full scan is done, of either the data or index rows, respectively), or range. Use of buffering is also applicable to outer joins, as described in Section 8.2.1.11, “Block Nested-Loop and Batched Key Access Joins”.
A join buffer is never allocated for the first nonconstant table, even if it would be of type ALL or index.
Only columns of interest to a join are stored in its join buffer, not whole rows.
The join_buffer_size system variable determines the size of each join buffer used to process a query.
One buffer is allocated for each join that can be buffered, so a given query might be processed using multiple join buffers.
A join buffer is allocated prior to executing the join and freed after the query is done.

For the example join described previously for the NLJ algorithm (without buffering), the join is done as follows using join buffering:

for each row in t1 matching range {
  for each row in t2 matching reference key {
    store used columns from t1, t2 in join buffer
    if buffer is full {
      for each row in t3 {
        for each t1, t2 combination in join buffer {
          if row satisfies join conditions, send to client
        }
      }
      empty join buffer
    }
  }
}

if buffer is not empty {
  for each row in t3 {
    for each t1, t2 combination in join buffer {
      if row satisfies join conditions, send to client
    }
  }
}

如果是连接缓冲区S中每个存储 t1的组合的大小，是缓冲区中的组合数，则扫描表的次数为： t2Ct3

(S * C)/join_buffer_size + 1

扫描次数t3随着值的join_buffer_size 增加而减少，直到 join_buffer_size大到足以容纳所有先前的行组合为止。在这一点上，通过增大它不会获得任何速度。