10.10.7.1 cp932 字符集_MySQL 5.7 参考手册

MySQL 8.0 参考手册 / 第 10 章字符集、排序规则、Unicode / 10.10 支持的字符集和归类 / 10.10.7 亚洲字符集 / 10.10.7.1 cp932 字符集

10.10.7.1 cp932 字符集

为什么cp932 需要？

在MySQL中，sjis字符集对应Shift_JISIANA定义的字符集，支持JIS X0201和JIS X0208字符。（参见 http://www.iana.org/assignments/character-sets。）

然而，“ SHIFT JIS ”作为一个描述性术语的含义已经变得非常模糊，它通常包括Shift_JIS各种供应商定义的扩展。

例如，日语 Windows 环境中使用的 “ SHIFT JIS ”Shift_JIS是 Microsoft 的扩展名，其确切名称是 Microsoft Windows Codepage : 932或 cp932。除了支持的字符外Shift_JIS，还cp932支持扩展字符，如 NEC 特殊字符、NEC selected—IBM 扩展字符、IBM selected 字符。

许多日本用户在使用这些扩展字符时遇到问题。这些问题源于以下因素：

MySQL 自动转换字符集。
字符集使用 Unicode ( ucs2) 进行转换。
sjis字符集不支持这些扩展字符的转换。
从所谓的 “ SHIFT JIS ”到 Unicode 的转换规则有几种，根据转换规则，某些字符转换为 Unicode 的方式也不同。MySQL 只支持这些规则之一（稍后描述）。

MySQLcp932字符集就是为了解决这些问题而设计的。

因为 MySQL 支持字符集转换，所以将 IANAShift_JIS和 cp932分成两个不同的字符集很重要，因为它们提供不同的转换规则。

与有何cp932不同？sjis

cp932字符集在以下sjis方面有所不同：

cp932支持 NEC 特殊字符、NEC selected—IBM 扩展字符和 IBM selected 字符。
一些cp932字符有两个不同的代码点，它们都转换为相同的 Unicode 代码点。从 Unicode 转换回时 cp932，必须选择代码点之一。对于这种“往返转换” ，使用了 Microsoft 推荐的规则。（请参阅 http://support.microsoft.com/kb/170559/EN-US/。）
The conversion rule works like this:
- If the character is in both JIS X 0208 and NEC special characters, use the code point of JIS X 0208.
- If the character is in both NEC special characters and IBM selected characters, use the code point of NEC special characters.
- If the character is in both IBM selected characters and NEC selected—IBM extended characters, use the code point of IBM extended characters.
https://msdn.microsoft.com/en-us/goglobal/cc305152.aspx 中显示的表格提供了有关字符的 Unicode 值的信息 cp932。对于 cp932其下出现四位数字的字符的表条目，该数字表示相应的 Unicode ( ucs2) 编码。对于带有带下划线的两位数值的表条目，存在 cp932以这两位数开头的字符值范围。单击此类表条目会将您带到一个页面，该页面显示 cp932以这些数字开头的每个字符的 Unicode 值。
The following links are of special interest. They correspond to the encodings for the following sets of characters:
- NEC special characters (lead byte 0x87):
```
https://msdn.microsoft.com/en-us/goglobal/gg674964
```
- NEC selected—IBM extended characters (lead byte 0xED and 0xEE):
```
https://msdn.microsoft.com/en-us/goglobal/gg671837
https://msdn.microsoft.com/en-us/goglobal/gg671838
```
- IBM selected characters (lead byte 0xFA, 0xFB, 0xFC):
```
https://msdn.microsoft.com/en-us/goglobal/gg671839
https://msdn.microsoft.com/en-us/goglobal/gg671840
https://msdn.microsoft.com/en-us/goglobal/gg671841
```
cp932 supports conversion of user-defined characters in combination with eucjpms, and solves the problems with sjis/ujis conversion. For details, please refer to http://www.sljfaq.org/afaq/encodings.html.

For some characters, conversion to and from ucs2 is different for sjis and cp932. The following tables illustrate these differences.

Conversion to ucs2:

`sjis`/`cp932` Value	`sjis` -> `ucs2` Conversion	`cp932` -> `ucs2` Conversion
5C	005C	005C
7E	007E	007E
815C	2015	2015
815F	005C	FF3C
8160	301C	FF5E
8161	2016	2225
817C	2212	FF0D
8191	00A2	FFE0
8192	00A3	FFE1
81CA	00AC	FFE2

Conversion from ucs2:

`ucs2` value	`ucs2` -> `sjis` Conversion	`ucs2` -> `cp932` Conversion
005C	815F	5C
007E	7E	7E
00A2	8191	3F
00A3	8192	3F
00AC	81CA	3F
2015	815C	815C
2016	8161	3F
2212	817C	3F
2225	3F	8161
301C	8160	3F
FF0D	3F	817C
FF3C	3F	815F
FF5E	3F	8160
FFE0	3F	8191
FFE1	3F	8192
FFE2	3F	81CA

Users of any Japanese character sets should be aware that using --character-set-client-handshake (or --skip-character-set-client-handshake) has an important effect. See Section 5.1.6, “Server Command Options”.