Am 29.07.2016 um 13:19 schrieb Shyam Sundar:
Thanks Kovi for quick response.

Well why does it fail only for a particular file, a replica of same file
generated using another pdf library works perfectly fine with
PDFTextStripper ... isn't it strange and look like a bug ?

I hope you checked shared Sample.zip, it has both working & non-working
files.

The "working" file has lines with one space, that is why.

That is what I'd expected. If you want a perfectly formatted text, why not use the PDF? Text extraction is usually for searching.

You can also use PrintTextLocations.java example, this will show the coordinates of every character. The DrawPrintTextLocations examples will show you that and also the visual location of the glyphs in an image rendering.

What you could also try is setParagraphStart("\n") and/or setParagraphEnd("\n").

Tilman


Regards.

On Fri, Jul 29, 2016 at 4:30 PM, Gregor Kovač <[email protected]> wrote:

Hi!

API docs for PDFTextStripper (

http://pdfbox.apache.org/docs/2.0.2/javadocs/org/apache/pdfbox/text/PDFTextStripper.html
)
states that "This class will take a pdf document and strip out all of the
text and ignore the formatting and such". Please note that you can
call setAddMoreFormatting (

http://pdfbox.apache.org/docs/2.0.2/javadocs/org/apache/pdfbox/text/PDFTextStripper.html#setAddMoreFormatting(boolean)
)
with true and it will add a bit more formatting, but in my experience this
does not compare to using "pdftotext -layout" from Xpdf project. pdftotext
does a much better job preserving layout.

Best regards,
     Kovi

2016-07-29 12:44 GMT+02:00 Shyam Sundar <[email protected]>:

Hi,

While converting a particular pdf to txt, spacing between lines and
paragraphs is not retained, output is just a flat text.

Sample file : ftp://PfXxyEhxh:[email protected]/Sample.zip

Looks like a file specific issue. Can you pls check ?

Thanks.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



--
-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~
|  In A World Without Fences Who Needs Gates?  |
|              Experience Linux.               |
-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to