Generate GB18030 mappings from the Unicode Consortium's UCM file Previously we built the .map files for GB18030 (version 2000) from an XML file. The 2022 version for this encoding is only available as a Unicode Character Mapping (UCM) file, so as preparatory refactoring switch to this format as the source for building version 2000.
As we do with most input files for the conversion mappings, download the file on demand. In order to generate the same mappings we have now, we must download from a previous upstream commit, rather than the head since the latter contains a correction not present in our current .map files. The XML file is still used by EUC_CN, so we cannot delete it from our repository. GB18030 is a superset of EUC_CN, so it may be possible to build EUC_CN from the same UCM file, but that is left for future work. Author: Chao Li <l...@highgo.com> Discussion: https://postgr.es/m/966d9fc.169.198741fe60b.Coremail.jiaoshuntian%40highgo.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/cfa6cd29271e67c43c1040e3420c1145fdcdceb7 Modified Files -------------- src/backend/utils/mb/Unicode/Makefile | 5 +++- src/backend/utils/mb/Unicode/UCS_to_GB18030.pl | 28 +++++++++++++++------- .../utf8_and_gb18030/utf8_and_gb18030.c | 7 +++++- 3 files changed, 29 insertions(+), 11 deletions(-)