Hi

Am 18.02.2012 14:40, schrieb Hamed Iravanchi:
Hi again,

Regarding the CID-coded glyph/character mapping, and the I have some
more findings that I want to share, maybe one of you guys can point
out something that can help me get there faster.

Using Adobe Acrobat, I was able to dig deep in the PDF file structure,
and see how the data is being read by PDFBox.

There are two utilities in the "options" menu of Adobe Acrobat "Preflight" tool:
* "Browse Internal PDF Structure"
PDFBox also provides a tool (PDFDebugger) to browse the internal structure of a pdf.

* "Browse Internal Structure of All Document Fonts"

In the first one, I could find the "ToUnicode" mapping that I talked
about before in the font resources. The font is a type-0 one, which
has a "CIDFontType2" descendant font. The "awtFont" used to draw
characters on graphics object is read from the "FontFile2" stream
inside this object in PDF.

There is no CID mappings in this font. CIDToGIDMap is "Identity". I'll
include a screenshot of this in the email.

On the other hand, the second option ("Browse Internal Structure of
All Document Fonts") contains glyph details, and ALSO correct CID
mappings. It's in the following path:
Font>  Internal Structure>  Data Tables>  Character to Glyph Mapping ('cmap')

For each character, the data contains both correct UNICODE value
(either original or representation) and correct Glyph code.

In the PDFBox, if I map the CID to correct UNICODE value from this
table, it should work fine. But I could not find anywhere in the
PDFBox code that such mappings are read from the PDF file, and I have
no idea where in PDF file is such information stored.

If anyone has an idea, please let me know.
I guess I've cracked the nut. :-)

- PDFBox uses strings to be rendered, the same which are used for text 
extraction
- in case of CID-encoded fonts the ToUnicode-mapping is used to get readable strings, but these strings can't be used to draw the string - in case of CID-encoded fonts we have to use the font internal id to adress the glyphs

I have to clean up the code and run some tests before checking in the code.

Thanks a lot,
Hamed
We have to thank you, your detailed analysis helped me to find out what piece of code is still missing.

-- Original Message:

Hi,

Am 16.02.2012 05:40, schrieb Hesham G.:

Hamed ,

Nice effort .. Thanks for sharing the nice information. I hope you
will be able to overcome this, and share your solution.

I have to agree, thanks for the details. I also dug deeper into that
part of the code more than once. The issue is the CID-coded
glyph/character mapping. Maybe I'm able to crack that nut with your
information.

Best regards , Hesham

--------------------------------------------- Included message :
<SNIP>

BR
Andreas Lehmkühler

Reply via email to