14.9.1.5 InnoDB 表的压缩工作原理

MySQL 8.0 参考手册 / 第 15 章 InnoDB 存储引擎 / 15.9 InnoDB 表和页压缩 / 15.9.1 InnoDB 表压缩 / 14.9.1.5 InnoDB 表的压缩工作原理

14.9.1.5 InnoDB 表的压缩工作原理

本节描述有关InnoDB 表压缩的一些内部实现细节。此处提供的信息可能有助于调整性能，但对于压缩的基本使用而言不是必需的。

压缩算法

一些操作系统在文件系统级别实施压缩。文件通常被分成固定大小的块，这些块被压缩成可变大小的块，这很容易导致碎片化。每次修改块内的内容时，整个块都会在写入磁盘之前重新压缩。这些特性使这种压缩技术不适合在更新密集型数据库系统中使用。

MySQL 借助著名的 zlib 库实现压缩，该库实现了 LZ77 压缩算法。这种压缩算法在 CPU 利用率和数据大小缩减方面成熟、稳健且高效。该算法是 “无损”的，因此原始的未压缩数据总是可以从压缩形式中重建出来。LZ77 压缩通过查找在要压缩的数据中重复的数据序列来工作。数据中值的模式决定了它的压缩程度，但典型的用户数据通常会压缩 50% 或更多。

笔记

在 MySQL 5.7.24 之前，InnoDB支持的 zlib库最高版本为 1.2.3。在 MySQL 5.7.24 及更高版本中，InnoDB支持 zlib版本最高为 1.2.11 的库。

与应用程序执行的压缩或某些其他数据库管理系统的压缩功能不同，InnoDB 压缩适用于用户数据和索引。在许多情况下，索引可能占数据库总大小的 40-50% 或更多，因此这种差异很明显。当数据集的压缩效果很好时，InnoDB 数据文件（每个表的文件表空间或一般表空间 .ibd文件）的大小是未压缩大小的 25% 到 50% 或可能更小。视工作量而定，这个较小的数据库反过来会导致 I/O 的减少和吞吐量的增加，而在增加 CPU 利用率方面的代价是适度的。innodb_compression_level 您可以通过修改配置选项来调整压缩级别和 CPU 开销之间的平衡。

InnoDB 数据存储和压缩

InnoDB 表中的所有用户数据都存储在包含 B 树索引（聚集索引）的页面中。在其他一些数据库系统中，这种类型的索引被称为 “索引组织表”。索引节点中的每一行都包含（用户指定的或系统生成的）主键的值以及表的所有其他列。

InnoDB 表中的二级索引也是 B 树，包含成对的值：索引键和指向聚集索引中一行的指针。该指针实际上是表的主键的值，如果需要索引键和主键以外的列，则用于访问聚簇索引。二级索引记录必须始终适合单个 B 树页面。

B 树节点（包括聚集索引和二级索引）的压缩处理方式与用于存储 long 、或列的溢出页压缩的处理方式不同，如以下部分所述。 VARCHARBLOBTEXT

B 树页面的压缩

因为它们经常更新，所以 B 树页面需要特殊处理。重要的是尽量减少 B 树节点的分裂次数，以及尽量减少解压缩和重新压缩其内容的需要。

MySQL 使用的一种技术是以未压缩的形式在 B 树节点中维护一些系统信息，从而促进某些就地更新。例如，这允许在不进行任何压缩操作的情况下对行进行删除标记和删除。

此外，MySQL 会尝试在索引页更改时避免不必要的解压缩和重新压缩。在每个 B-tree 页面中，系统保留一个未压缩的 “修改日志”来记录对页面所做的更改。可以将小记录的更新和插入写入此修改日志，而无需完全重建整个页面。

当修改日志的空间用完时，InnoDB 解压缩页面，应用更改并重新压缩页面。如果重新压缩失败（这种情况称为压缩失败），B 树节点将被拆分并重复该过程，直到更新或插入成功。

To avoid frequent compression failures in write-intensive workloads, such as for OLTP applications, MySQL sometimes reserves some empty space (padding) in the page, so that the modification log fills up sooner and the page is recompressed while there is still enough room to avoid splitting it. The amount of padding space left in each page varies as the system keeps track of the frequency of page splits. On a busy server doing frequent writes to compressed tables, you can adjust the innodb_compression_failure_threshold_pct, and innodb_compression_pad_pct_max configuration options to fine-tune this mechanism.

通常，MySQL 要求 InnoDB 表中的每个 B 树页至少可以容纳两条记录。对于压缩表，此要求已放宽。B 树节点的叶页（无论是主键还是二级索引）只需要容纳一条记录，但该记录必须以未压缩的形式适合每页修改日志。如果 innodb_strict_mode是，MySQL 在或 ON期间检查最大行大小。如果该行不适合，将发出以下错误消息：。 CREATE TABLECREATE INDEXERROR HY000: Too big row

如果您在 OFF 时创建表 innodb_strict_mode，并且后续INSERTor UPDATE语句尝试创建不适合压缩页面大小的索引条目，则操作失败并显示ERROR 42000: Row size too large。（此错误消息未命名记录过大的索引，也未提及索引记录的长度或该特定索引页上的最大记录大小。）要解决此问题，请重建表 ALTER TABLE并选择一个更大的压缩页面大小 ( )，缩短任何列前缀索引，或使用或 KEY_BLOCK_SIZE完全禁用压缩。 ROW_FORMAT=DYNAMICROW_FORMAT=COMPACT

innodb_strict_mode is not applicable to general tablespaces, which also support compressed tables. Tablespace management rules for general tablespaces are strictly enforced independently of innodb_strict_mode. For more information, see Section 13.1.19, “CREATE TABLESPACE Statement”.

Compressing BLOB, VARCHAR, and TEXT Columns

In an InnoDB table, BLOB, VARCHAR, and TEXT columns that are not part of the primary key may be stored on separately allocated overflow pages. We refer to these columns as off-page columns. Their values are stored on singly-linked lists of overflow pages.

For tables created in ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED, the values of BLOB, TEXT, or VARCHAR columns may be stored fully off-page, depending on their length and the length of the entire row. For columns that are stored off-page, the clustered index record only contains 20-byte pointers to the overflow pages, one per column. Whether any columns are stored off-page depends on the page size and the total size of the row. When the row is too long to fit entirely within the page of the clustered index, MySQL chooses the longest columns for off-page storage until the row fits on the clustered index page. As noted above, if a row does not fit by itself on a compressed page, an error occurs.

Note

For tables created in ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED, TEXT and BLOB columns that are less than or equal to 40 bytes are always stored in-line.

Tables created in older versions of MySQL use the Antelope file format, which supports only ROW_FORMAT=REDUNDANT and ROW_FORMAT=COMPACT. In these formats, MySQL stores the first 768 bytes of BLOB, VARCHAR, and TEXT columns in the clustered index record along with the primary key. The 768-byte prefix is followed by a 20-byte pointer to the overflow pages that contain the rest of the column value.

When a table is in COMPRESSED format, all data written to overflow pages is compressed “as is”; that is, MySQL applies the zlib compression algorithm to the entire data item. Other than the data, compressed overflow pages contain an uncompressed header and trailer comprising a page checksum and a link to the next overflow page, among other things. Therefore, very significant storage savings can be obtained for longer BLOB, TEXT, or VARCHAR columns if the data is highly compressible, as is often the case with text data. Image data, such as JPEG, is typically already compressed and so does not benefit much from being stored in a compressed table; the double compression can waste CPU cycles for little or no space savings.

The overflow pages are of the same size as other pages. A row containing ten columns stored off-page occupies ten overflow pages, even if the total length of the columns is only 8K bytes. In an uncompressed table, ten uncompressed overflow pages occupy 160K bytes. In a compressed table with an 8K page size, they occupy only 80K bytes. Thus, it is often more efficient to use compressed table format for tables with long column values.

For file-per-table tablespaces, using a 16K compressed page size can reduce storage and I/O costs for BLOB, VARCHAR, or TEXT columns, because such data often compress well, and might therefore require fewer overflow pages, even though the B-tree nodes themselves take as many pages as in the uncompressed form. General tablespaces do not support a 16K compressed page size (KEY_BLOCK_SIZE). For more information, see Section 14.6.3.3, “General Tablespaces”.

Compression and the InnoDB Buffer Pool

In a compressed InnoDB table, every compressed page (whether 1K, 2K, 4K or 8K) corresponds to an uncompressed page of 16K bytes (or a smaller size if innodb_page_size is set). To access the data in a page, MySQL reads the compressed page from disk if it is not already in the buffer pool, then uncompresses the page to its original form. This section describes how InnoDB manages the buffer pool with respect to pages of compressed tables.

To minimize I/O and to reduce the need to uncompress a page, at times the buffer pool contains both the compressed and uncompressed form of a database page. To make room for other required database pages, MySQL can evict from the buffer pool an uncompressed page, while leaving the compressed page in memory. Or, if a page has not been accessed in a while, the compressed form of the page might be written to disk, to free space for other data. Thus, at any given time, the buffer pool might contain both the compressed and uncompressed forms of the page, or only the compressed form of the page, or neither.

MySQL keeps track of which pages to keep in memory and which to evict using a least-recently-used (LRU) list, so that hot (frequently accessed) data tends to stay in memory. When compressed tables are accessed, MySQL uses an adaptive LRU algorithm to achieve an appropriate balance of compressed and uncompressed pages in memory. This adaptive algorithm is sensitive to whether the system is running in an I/O-bound or CPU-bound manner. The goal is to avoid spending too much processing time uncompressing pages when the CPU is busy, and to avoid doing excess I/O when the CPU has spare cycles that can be used for uncompressing compressed pages (that may already be in memory). When the system is I/O-bound, the algorithm prefers to evict the uncompressed copy of a page rather than both copies, to make more room for other disk pages to become memory resident. When the system is CPU-bound, MySQL prefers to evict both the compressed and uncompressed page, so that more memory can be used for “hot” pages and reducing the need to uncompress data in memory only in compressed form.

Compression and the InnoDB Redo Log Files

Before a compressed page is written to a data file, MySQL writes a copy of the page to the redo log (if it has been recompressed since the last time it was written to the database). This is done to ensure that redo logs are usable for crash recovery, even in the unlikely case that the zlib library is upgraded and that change introduces a compatibility problem with the compressed data. Therefore, some increase in the size of log files, or a need for more frequent checkpoints, can be expected when using compression. The amount of increase in the log file size or checkpoint frequency depends on the number of times compressed pages are modified in a way that requires reorganization and recompression.

压缩表需要 Barracuda文件格式。要在 file-per-table 表空间中创建压缩表， innodb_file_per_table必须启用并且innodb_file_format 必须设置为Barracuda。innodb_file_format在通用表空间中创建压缩表时不依赖于设置。有关详细信息，请参阅第 14.6.3.3 节，“通用表空间”。MySQL Enterprise Backup产品支持 Barracuda文件格式。