Source: poppler Version: 22.12.0-2 Severity: normal Tags: patch upstream Forwarded: https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/1444
For \simeq, TeX generates /similarequal instead of Adobe's /asymptoticallyequal; so similarequal needs to be supported too. In TeX Live 2023: texmf-dist/fonts/map/glyphlist/glyphlist.txt (Adobe Glyph List) contains asymptoticallyequal;2243 but texmf-dist/fonts/map/glyphlist/texglyphlist.txt (Extensions to the Adobe Glyph List for TeX fonts and encodings) contains similarequal;2243 As a consequence, texmf-dist/tex/generic/pdftex/glyphtounicode.tex contains both \pdfglyphtounicode{asymptoticallyequal}{2243} \pdfglyphtounicode{similarequal}{2243} NameToUnicodeTable.h already has { 0x2243, "asymptoticallyequal" } so one just needs to add the missing { 0x2243, "similarequal" } To reproduce the issue, consider the following simeq.tex file: \documentclass{article} \usepackage[T1]{fontenc} \begin{document} \thispagestyle{empty} $\simeq\approx$ \end{document} In the PDF file generated by pdflatex, after uncompressing it with "qpdf --stream-data=uncompress": /F32 9.9626 Tf 148.712 707.125 Td [('\031)]TJ and dup 25 /approxequal put dup 39 /similarequal put i.e. /similarequal is generated for \simeq, and pdftotext gives '≈ (the apostrophe ', code 39, corresponds to /similarequal, but appears as an apostrophe since /similarequal is not supported; and \031, i.e. 25 in decimal, corresponds to /approxequal, which appears correctly because /approxequal is supported). With the attached patch, pdftotext gives ≃≈ as wanted. I've created a merge request upstream. -- System Information: Debian Release: trixie/sid APT prefers unstable-debug APT policy: (500, 'unstable-debug'), (500, 'stable-updates'), (500, 'stable-security'), (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental') merged-usr: no Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 6.4.0-3-amd64 (SMP w/12 CPU threads; PREEMPT) Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=POSIX, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled -- Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Description: add ToUnicode support for similarequal. For \simeq, TeX generates /similarequal instead of Adobe's /asymptoticallyequal; so similarequal needs to be supported too. In TeX Live 2023: texmf-dist/fonts/map/glyphlist/glyphlist.txt (Adobe Glyph List) contains asymptoticallyequal;2243 but texmf-dist/fonts/map/glyphlist/texglyphlist.txt (Extensions to the Adobe Glyph List for TeX fonts and encodings) contains similarequal;2243 As a consequence, texmf-dist/tex/generic/pdftex/glyphtounicode.tex contains both \pdfglyphtounicode{asymptoticallyequal}{2243} \pdfglyphtounicode{similarequal}{2243} NameToUnicodeTable.h already has { 0x2243, "asymptoticallyequal" } so one just needs to add the missing { 0x2243, "similarequal" } Merge-Request: https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/1444 Author: Vincent Lefevre <vinc...@vinc17.net> Last-Update: 2023-08-30 diff --git a/poppler/NameToUnicodeTable.h b/poppler/NameToUnicodeTable.h index c7749f00..36bb5bb7 100644 --- a/poppler/NameToUnicodeTable.h +++ b/poppler/NameToUnicodeTable.h @@ -3518,6 +3518,7 @@ static const struct NameToUnicodeTab nameToUnicodeTextTab[] = { { 0x0021, "!" }, { 0x05bd, "siluqhebrew" }, { 0x05bd, "siluqlefthebrew" }, { 0x223c, "similar" }, + { 0x2243, "similarequal" }, { 0x05c2, "sindothebrew" }, { 0x3274, "siosacirclekorean" }, { 0x3214, "siosaparenkorean" },