Re: Issues getting text from a document

Erik Scholtz, ArgonSoft GmbH Fri, 12 Mar 2010 00:41:12 -0800

Mario,

it would be great if you could provide us a sample-pdf on JIRA[1]. Areyou using the TextExtratctor or did you write a program of your ownusing PDFBox?


Greetings,
Erik

[1] https://issues.apache.org/jira/browse/PDFBOX
--

My blog: http://blog.elitecoderz.net

Mario Sangiorgio wrote:

Hi,
I am writing this e-mail because I am having issues parsing pdf documents
with PDFBox.

For example I am trying to parse the PDF of a paper, but I get its title
screwed up as in the following example.

An
Asp
e
ct-Orien
ted
F
ramew
o
rk
for
S
ervice
A
d
aptation

As you can see I get newlines rather than spaces and even worst there are
other newlines in the middle of the words.

If it may help, feel free to ask me any clarification and any test.

Mario

Re: Issues getting text from a document

Reply via email to