I have written an application that generates an .epub document from user
input.
I am now trying to use PdfBox to add PDF output of the same source text.
But I have encountered problems when trying to render bold or italic text:
- In the italic font, the characters u and i in the word "quick" are
overlapped.
- In the word-pair "brown fox" (where "brown" is in plain font and "fox"
is italic) there is no space between the words but there is an extra
space between the f and o in "fox".
- In the phrase "dog and ran" (which is bold) the single space between
"and" and "ran" is too wide, and there is no space following "ran" and
the next word.
And yet, the same string is rendered with correct spacing when output as
plain text (no font changes).
See the output files at:
https://www.dropbox.com/s/ox4arbrfiv5jqfu/withNoHtmlTags.pdf?dl=0
https://www.dropbox.com/s/wgj029hm4wre1x5/withItalicsAndBoldFonts.pdf?dl=0
As a newbie to both PDF and PdfBox, I started with a tutorial I found at
http://www.coderanch.com/t/659953/Wiki/PDFBox. Once I verified that I
had entered the tutorial correctly by running it and viewing the output,
I began experimenting by displaying a simple test string that is long
enough to require word wrapping. When I got that to work, I tried adding
bold and italic HTML tags to the string (since the end goal is to create
PDF from .epub source).
Here is my test code:
https://www.dropbox.com/s/k9d22s0xsgg8tz8/TestBed.java?dl=0
In TestBed.java, doTutorial() is the unmodified tutorial.
The method doMyCode() displays the test string by breaking it into
individual whole words. If I mark words with <i> and <b> tags, they are
correctly rendered with bold and italic fonts. But this limits font
changes to whole words only, which rules out a font change in the middle
of a string of characters. To handle that I need to output individual
characters, not words.
The method doMyCode2() displays the test string word by word unless the
word contains an HTML tag, then text is rendered character by character.
If the test string contains no tags, it renders correctly.
See the sample file withNoHtmlTags.pdf.
When <i> and <b> tags are encountered, fonts get changed to
PDType1Font.TIMES_BOLD or PDType1Font.TIMES_ITALIC as required, and the
string is rendered, but the character spacing is mangled.
See the sample file withItalicsAndBoldFonts.pdf.
Both of these files were generated by the same code---the doMyCode2()
method---with the only change being the addition or subtraction of <i>
and <b> tags to the string paraText.
It does not appear to be a font problem, rather a rendering problem. I
get the same (well, nearly the same) results with both Times and
Helvetica---the "nearly the same" being the positioning of the u and I
characters in the word "quick"---still overlapping, but in the Helvetica
rendering, the i is in the middle of the u while in the Times rendering,
the i overlaps the last stroke of the u so that it looks like a u with a
dot over its tail.
What can I do to fix this?
Thanks.
Jerry
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]