本节描述有关InnoDB 表压缩的 一些内部实现细节。此处提供的信息可能有助于调整性能,但对于压缩的基本使用而言不是必需的。
压缩算法
一些操作系统在文件系统级别实施压缩。文件通常被分成固定大小的块,这些块被压缩成可变大小的块,这很容易导致碎片化。每次修改块内的内容时,整个块都会在写入磁盘之前重新压缩。这些特性使这种压缩技术不适合在更新密集型数据库系统中使用。
MySQL 借助著名的 zlib 库实现压缩,该库实现了 LZ77 压缩算法。这种压缩算法在 CPU 利用率和数据大小缩减方面成熟、稳健且高效。该算法是 “无损”的,因此原始的未压缩数据总是可以从压缩形式中重建出来。LZ77 压缩通过查找在要压缩的数据中重复的数据序列来工作。数据中值的模式决定了它的压缩程度,但典型的用户数据通常会压缩 50% 或更多。
在 MySQL 5.7.24 之前,InnoDB
支持的
zlib
库最高版本为 1.2.3。在 MySQL 5.7.24 及更高版本中,InnoDB
支持
zlib
版本最高为 1.2.11 的库。
与应用程序执行的压缩或某些其他数据库管理系统的压缩功能不同,InnoDB 压缩适用于用户数据和索引。在许多情况下,索引可能占数据库总大小的 40-50% 或更多,因此这种差异很明显。当数据集的压缩效果很好时,InnoDB 数据文件(
每个表的文件表
空间或一般表空间 .ibd
文件)的大小是未压缩大小的 25% 到 50% 或可能更小。视
工作量而定,这个较小的数据库反过来会导致 I/O 的减少和吞吐量的增加,而在增加 CPU 利用率方面的代价是适度的。innodb_compression_level
您可以通过修改配置选项
来调整压缩级别和 CPU 开销之间的平衡
。
InnoDB 数据存储和压缩
InnoDB 表中的所有用户数据都存储在包含 B 树索引( 聚集索引)的页面中。在其他一些数据库系统中,这种类型的索引被称为 “索引组织表”。索引节点中的每一行都包含(用户指定的或系统生成的) 主键的值以及表的所有其他列。
InnoDB 表中的二级索引也是 B 树,包含成对的值:索引键和指向聚集索引中一行的指针。该指针实际上是表的主键的值,如果需要索引键和主键以外的列,则用于访问聚簇索引。二级索引记录必须始终适合单个 B 树页面。
B 树节点(包括聚集索引和二级索引)的压缩处理方式与
用于存储 long 、或列的溢出页压缩的处理方式不同,如以下部分所述。
VARCHAR
BLOB
TEXT
B 树页面的压缩
因为它们经常更新,所以 B 树页面需要特殊处理。重要的是尽量减少 B 树节点的分裂次数,以及尽量减少解压缩和重新压缩其内容的需要。
MySQL 使用的一种技术是以未压缩的形式在 B 树节点中维护一些系统信息,从而促进某些就地更新。例如,这允许在不进行任何压缩操作的情况下对行进行删除标记和删除。
此外,MySQL 会尝试在索引页更改时避免不必要的解压缩和重新压缩。在每个 B-tree 页面中,系统保留一个未压缩的 “修改日志”来记录对页面所做的更改。可以将小记录的更新和插入写入此修改日志,而无需完全重建整个页面。
当修改日志的空间用完时,InnoDB 解压缩页面,应用更改并重新压缩页面。如果重新压缩失败(这种情况称为 压缩失败),B 树节点将被拆分并重复该过程,直到更新或插入成功。
To avoid frequent compression failures in write-intensive
workloads, such as for OLTP
applications, MySQL sometimes reserves some empty space
(padding) in the page, so that the modification log fills up
sooner and the page is recompressed while there is still enough
room to avoid splitting it. The amount of padding space left in
each page varies as the system keeps track of the frequency of
page splits. On a busy server doing frequent writes to
compressed tables, you can adjust the
innodb_compression_failure_threshold_pct
,
and
innodb_compression_pad_pct_max
configuration options to fine-tune this mechanism.
通常,MySQL 要求 InnoDB 表中的每个 B 树页至少可以容纳两条记录。对于压缩表,此要求已放宽。B 树节点的叶页(无论是主键还是二级索引)只需要容纳一条记录,但该记录必须以未压缩的形式适合每页修改日志。如果
innodb_strict_mode
是
,MySQL 在或
ON
期间检查最大行大小
。如果该行不适合,将发出以下错误消息:。
CREATE TABLE
CREATE INDEX
ERROR
HY000: Too big row
如果您在 OFF 时创建表
innodb_strict_mode
,并且后续INSERT
or
UPDATE
语句尝试创建不适合压缩页面大小的索引条目,则操作失败并显示ERROR 42000: Row size too
large
。(此错误消息未命名记录过大的索引,也未提及索引记录的长度或该特定索引页上的最大记录大小。)要解决此问题,请重建表
ALTER TABLE
并选择一个更大的压缩页面大小 ( ),缩短任何列前缀索引,或使用或
KEY_BLOCK_SIZE
完全禁用压缩。
ROW_FORMAT=DYNAMIC
ROW_FORMAT=COMPACT
innodb_strict_mode
is not
applicable to general tablespaces, which also support compressed
tables. Tablespace management rules for general tablespaces are
strictly enforced independently of
innodb_strict_mode
. For more
information, see Section 13.1.19, “CREATE TABLESPACE Statement”.
Compressing BLOB, VARCHAR, and TEXT Columns
In an InnoDB table, BLOB
,
VARCHAR
, and
TEXT
columns that are not part of
the primary key may be stored on separately allocated
overflow pages. We
refer to these columns as
off-page columns.
Their values are stored on singly-linked lists of overflow
pages.
For tables created in ROW_FORMAT=DYNAMIC
or
ROW_FORMAT=COMPRESSED
, the values of
BLOB
,
TEXT
, or
VARCHAR
columns may be stored
fully off-page, depending on their length and the length of the
entire row. For columns that are stored off-page, the clustered
index record only contains 20-byte pointers to the overflow
pages, one per column. Whether any columns are stored off-page
depends on the page size and the total size of the row. When the
row is too long to fit entirely within the page of the clustered
index, MySQL chooses the longest columns for off-page storage
until the row fits on the clustered index page. As noted above,
if a row does not fit by itself on a compressed page, an error
occurs.
For tables created in ROW_FORMAT=DYNAMIC
or
ROW_FORMAT=COMPRESSED
,
TEXT
and
BLOB
columns that are less than
or equal to 40 bytes are always stored in-line.
Tables created in older versions of MySQL use the
Antelope file format, which
supports only ROW_FORMAT=REDUNDANT
and
ROW_FORMAT=COMPACT
. In these formats, MySQL
stores the first 768 bytes of
BLOB
,
VARCHAR
, and
TEXT
columns in the clustered
index record along with the primary key. The 768-byte prefix is
followed by a 20-byte pointer to the overflow pages that contain
the rest of the column value.
When a table is in COMPRESSED
format, all
data written to overflow pages is compressed “as
is”; that is, MySQL applies the zlib compression
algorithm to the entire data item. Other than the data,
compressed overflow pages contain an uncompressed header and
trailer comprising a page checksum and a link to the next
overflow page, among other things. Therefore, very significant
storage savings can be obtained for longer
BLOB
, TEXT
, or
VARCHAR
columns if the data is highly
compressible, as is often the case with text data. Image data,
such as JPEG
, is typically already compressed
and so does not benefit much from being stored in a compressed
table; the double compression can waste CPU cycles for little or
no space savings.
The overflow pages are of the same size as other pages. A row containing ten columns stored off-page occupies ten overflow pages, even if the total length of the columns is only 8K bytes. In an uncompressed table, ten uncompressed overflow pages occupy 160K bytes. In a compressed table with an 8K page size, they occupy only 80K bytes. Thus, it is often more efficient to use compressed table format for tables with long column values.
For file-per-table
tablespaces, using a 16K compressed page size can reduce storage
and I/O costs for BLOB
,
VARCHAR
, or
TEXT
columns, because such data
often compress well, and might therefore require fewer overflow
pages, even though the B-tree nodes themselves take as many
pages as in the uncompressed form. General tablespaces do not
support a 16K compressed page size
(KEY_BLOCK_SIZE
). For more information, see
Section 14.6.3.3, “General Tablespaces”.
Compression and the InnoDB Buffer Pool
In a compressed InnoDB
table, every
compressed page (whether 1K, 2K, 4K or 8K) corresponds to an
uncompressed page of 16K bytes (or a smaller size if
innodb_page_size
is set). To
access the data in a page, MySQL reads the compressed page from
disk if it is not already in the
buffer pool, then
uncompresses the page to its original form. This section
describes how InnoDB
manages the buffer pool
with respect to pages of compressed tables.
To minimize I/O and to reduce the need to uncompress a page, at times the buffer pool contains both the compressed and uncompressed form of a database page. To make room for other required database pages, MySQL can evict from the buffer pool an uncompressed page, while leaving the compressed page in memory. Or, if a page has not been accessed in a while, the compressed form of the page might be written to disk, to free space for other data. Thus, at any given time, the buffer pool might contain both the compressed and uncompressed forms of the page, or only the compressed form of the page, or neither.
MySQL keeps track of which pages to keep in memory and which to evict using a least-recently-used (LRU) list, so that hot (frequently accessed) data tends to stay in memory. When compressed tables are accessed, MySQL uses an adaptive LRU algorithm to achieve an appropriate balance of compressed and uncompressed pages in memory. This adaptive algorithm is sensitive to whether the system is running in an I/O-bound or CPU-bound manner. The goal is to avoid spending too much processing time uncompressing pages when the CPU is busy, and to avoid doing excess I/O when the CPU has spare cycles that can be used for uncompressing compressed pages (that may already be in memory). When the system is I/O-bound, the algorithm prefers to evict the uncompressed copy of a page rather than both copies, to make more room for other disk pages to become memory resident. When the system is CPU-bound, MySQL prefers to evict both the compressed and uncompressed page, so that more memory can be used for “hot” pages and reducing the need to uncompress data in memory only in compressed form.
Compression and the InnoDB Redo Log Files
Before a compressed page is written to a
data file, MySQL writes a
copy of the page to the redo log (if it has been recompressed
since the last time it was written to the database). This is
done to ensure that redo logs are usable for
crash recovery, even
in the unlikely case that the zlib
library is
upgraded and that change introduces a compatibility problem with
the compressed data. Therefore, some increase in the size of
log files, or a need for
more frequent
checkpoints, can be
expected when using compression. The amount of increase in the
log file size or checkpoint frequency depends on the number of
times compressed pages are modified in a way that requires
reorganization and recompression.
压缩表需要
Barracuda文件格式。要在 file-per-table 表空间中创建压缩表,
innodb_file_per_table
必须启用并且innodb_file_format
必须设置为Barracuda。innodb_file_format
在通用表空间中创建压缩表时不依赖于
设置。有关详细信息,请参阅第 14.6.3.3 节,“通用表空间”。MySQL Enterprise Backup产品支持
Barracuda文件格式
。