Mario,

it would be great if you could provide us a sample-pdf on JIRA[1]. Are you using the TextExtratctor or did you write a program of your own using PDFBox?

Greetings,
Erik

[1] https://issues.apache.org/jira/browse/PDFBOX
--

My blog: http://blog.elitecoderz.net

Mario Sangiorgio wrote:
Hi,
I am writing this e-mail because I am having issues parsing pdf documents
with PDFBox.

For example I am trying to parse the PDF of a paper, but I get its title
screwed up as in the following example.

An
Asp
e
ct-Orien
ted
F
ramew
o
rk
for
S
ervice
A
d
aptation

As you can see I get newlines rather than spaces and even worst there are
other newlines in the middle of the words.

If it may help, feel free to ask me any clarification and any test.

Mario


Reply via email to