8.4.1 优化数据大小_MySQL 5.7 参考手册

MySQL 8.0 参考手册 / 第8章优化 / 8.4 优化数据库结构 / 8.4.1 优化数据大小

8.4.1 优化数据大小

设计您的表以最小化它们在磁盘上的空间。通过减少写入磁盘和从磁盘读取的数据量，可以带来巨大的改进。较小的表通常需要较少的主内存，而它们的内容在查询执行期间被主动处理。表数据的任何空间减少也会导致可以更快处理的更小索引。

MySQL 支持许多不同的存储引擎（表类型）和行格式。对于每个表，您可以决定使用哪种存储和索引方法。为您的应用程序选择合适的表格格式可以大大提高性能。请参阅第 14 章，InnoDB 存储引擎和第 15 章，替代存储引擎。

通过使用此处列出的技术，您可以获得更好的表性能并最大限度地减少存储空间：

表列

尽可能使用最有效（最小）的数据类型。MySQL 有许多专门的类型可以节省磁盘空间和内存。例如，尽可能使用较小的整数类型来获得较小的表。 MEDIUMINT通常是一个更好的选择，INT因为 MEDIUMINT列使用的空间少了 25%。
NOT NULL如果可能，将列声明为。通过更好地使用索引并消除测试每个值是否为NULL. 您还可以节省一些存储空间，每列一位。如果您确实需要NULL表中的值，请使用它们。只需避免允许 NULL每列中的值的默认设置。

行格式

InnoDB默认情况下，表是使用 DYNAMIC行格式创建的。要使用以外的行格式，请在or语句中DYNAMIC配置 innodb_default_row_format或明确指定ROW_FORMAT选项。CREATE TABLEALTER TABLE
The compact family of row formats, which includes COMPACT, DYNAMIC, and COMPRESSED, decreases row storage space at the cost of increasing CPU use for some operations. If your workload is a typical one that is limited by cache hit rates and disk speed it is likely to be faster. If it is a rare case that is limited by CPU speed, it might be slower.
The compact family of row formats also optimizes CHAR column storage when using a variable-length character set such as utf8mb3 or utf8mb4. With ROW_FORMAT=REDUNDANT, CHAR(N) occupies N × the maximum byte length of the character set. Many languages can be written primarily using single-byte utf8 characters, so a fixed storage length often wastes space. With the compact family of rows formats, InnoDB allocates a variable amount of storage in the range of N to N × the maximum byte length of the character set for these columns by stripping trailing spaces. The minimum storage length is N bytes to facilitate in-place updates in typical cases. For more information, see Section 14.11, “InnoDB Row Formats”.
To minimize space even further by storing table data in compressed form, specify ROW_FORMAT=COMPRESSED when creating InnoDB tables, or run the myisampack command on an existing MyISAM table. (InnoDB compressed tables are readable and writable, while MyISAM compressed tables are read-only.)
For MyISAM tables, if you do not have any variable-length columns (VARCHAR, TEXT, or BLOB columns), a fixed-size row format is used. This is faster but may waste some space. See Section 15.2.3, “MyISAM Table Storage Formats”. You can hint that you want to have fixed length rows even if you have VARCHAR columns with the CREATE TABLE option ROW_FORMAT=FIXED.

Indexes

The primary index of a table should be as short as possible. This makes identification of each row easy and efficient. For InnoDB tables, the primary key columns are duplicated in each secondary index entry, so a short primary key saves considerable space if you have many secondary indexes.
Create only the indexes that you need to improve query performance. Indexes are good for retrieval, but slow down insert and update operations. If you access a table mostly by searching on a combination of columns, create a single composite index on them rather than a separate index for each column. The first part of the index should be the column most used. If you always use many columns when selecting from the table, the first column in the index should be the one with the most duplicates, to obtain better compression of the index.
If it is very likely that a long string column has a unique prefix on the first number of characters, it is better to index only this prefix, using MySQL's support for creating an index on the leftmost part of the column (see Section 13.1.14, “CREATE INDEX Statement”). Shorter indexes are faster, not only because they require less disk space, but because they also give you more hits in the index cache, and thus fewer disk seeks. See Section 5.1.1, “Configuring the Server”.

Joins

In some circumstances, it can be beneficial to split into two a table that is scanned very often. This is especially true if it is a dynamic-format table and it is possible to use a smaller static format table that can be used to find the relevant rows when scanning the table.
Declare columns with identical information in different tables with identical data types, to speed up joins based on the corresponding columns.
Keep column names simple, so that you can use the same name across different tables and simplify join queries. For example, in a table named customer, use a column name of name instead of customer_name. To make your names portable to other SQL servers, consider keeping them shorter than 18 characters.

Normalization

Normally, try to keep all data nonredundant (observing what is referred to in database theory as third normal form). Instead of repeating lengthy values such as names and addresses, assign them unique IDs, repeat these IDs as needed across multiple smaller tables, and join the tables in queries by referencing the IDs in the join clause.
如果速度比磁盘空间和保留多个数据副本的维护成本更重要，例如在分析大表中所有数据的商业智能场景中，您可以放宽规范化规则，复制信息或创建汇总表以获得更快的速度。