Source: poppler
Version: 22.12.0-2
Severity: normal
Tags: patch upstream
Forwarded: https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/1444

For \simeq, TeX generates /similarequal instead of Adobe's
/asymptoticallyequal; so similarequal needs to be supported too.

In TeX Live 2023:
texmf-dist/fonts/map/glyphlist/glyphlist.txt (Adobe Glyph List) contains

  asymptoticallyequal;2243

but texmf-dist/fonts/map/glyphlist/texglyphlist.txt (Extensions to the
Adobe Glyph List for TeX fonts and encodings) contains

  similarequal;2243

As a consequence, texmf-dist/tex/generic/pdftex/glyphtounicode.tex
contains both

  \pdfglyphtounicode{asymptoticallyequal}{2243}
  \pdfglyphtounicode{similarequal}{2243}

NameToUnicodeTable.h already has

  { 0x2243, "asymptoticallyequal" }
so one just needs to add the missing

  { 0x2243, "similarequal" }

To reproduce the issue, consider the following simeq.tex file:

\documentclass{article}
\usepackage[T1]{fontenc}
\begin{document}
\thispagestyle{empty}
$\simeq\approx$
\end{document}

In the PDF file generated by pdflatex, after uncompressing it with
"qpdf --stream-data=uncompress":

/F32 9.9626 Tf 148.712 707.125 Td [('\031)]TJ

and

dup 25 /approxequal put
dup 39 /similarequal put

i.e. /similarequal is generated for \simeq, and pdftotext gives

'≈

(the apostrophe ', code 39, corresponds to /similarequal, but appears
as an apostrophe since /similarequal is not supported; and \031, i.e.
25 in decimal, corresponds to /approxequal, which appears correctly
because /approxequal is supported).

With the attached patch, pdftotext gives

≃≈

as wanted.

I've created a merge request upstream.

-- System Information:
Debian Release: trixie/sid
  APT prefers unstable-debug
  APT policy: (500, 'unstable-debug'), (500, 'stable-updates'), (500, 
'stable-security'), (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 
'experimental')
merged-usr: no
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 6.4.0-3-amd64 (SMP w/12 CPU threads; PREEMPT)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, 
TAINT_UNSIGNED_MODULE
Locale: LANG=POSIX, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

-- 
Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Description: add ToUnicode support for similarequal.
 For \simeq, TeX generates /similarequal instead of Adobe's
 /asymptoticallyequal; so similarequal needs to be supported too.
 In TeX Live 2023:
 texmf-dist/fonts/map/glyphlist/glyphlist.txt (Adobe Glyph List) contains
   asymptoticallyequal;2243
 but texmf-dist/fonts/map/glyphlist/texglyphlist.txt (Extensions to the
 Adobe Glyph List for TeX fonts and encodings) contains
   similarequal;2243
 As a consequence, texmf-dist/tex/generic/pdftex/glyphtounicode.tex
 contains both
   \pdfglyphtounicode{asymptoticallyequal}{2243}
   \pdfglyphtounicode{similarequal}{2243}
 NameToUnicodeTable.h already has
   { 0x2243, "asymptoticallyequal" }
 so one just needs to add the missing
   { 0x2243, "similarequal" }
Merge-Request: 
https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/1444
Author: Vincent Lefevre <vinc...@vinc17.net>
Last-Update: 2023-08-30

diff --git a/poppler/NameToUnicodeTable.h b/poppler/NameToUnicodeTable.h
index c7749f00..36bb5bb7 100644
--- a/poppler/NameToUnicodeTable.h
+++ b/poppler/NameToUnicodeTable.h
@@ -3518,6 +3518,7 @@ static const struct NameToUnicodeTab 
nameToUnicodeTextTab[] = { { 0x0021, "!" },
                                                                 { 0x05bd, 
"siluqhebrew" },
                                                                 { 0x05bd, 
"siluqlefthebrew" },
                                                                 { 0x223c, 
"similar" },
+                                                                { 0x2243, 
"similarequal" },
                                                                 { 0x05c2, 
"sindothebrew" },
                                                                 { 0x3274, 
"siosacirclekorean" },
                                                                 { 0x3214, 
"siosaparenkorean" },

Reply via email to