Am 29.07.2016 um 13:19 schrieb Shyam Sundar:
Thanks Kovi for quick response.
Well why does it fail only for a particular file, a replica of same file
generated using another pdf library works perfectly fine with
PDFTextStripper ... isn't it strange and look like a bug ?
I hope you checked shared Sample.zip, it has both working & non-working
files.
The "working" file has lines with one space, that is why.
That is what I'd expected. If you want a perfectly formatted text, why
not use the PDF? Text extraction is usually for searching.
You can also use PrintTextLocations.java example, this will show the
coordinates of every character. The DrawPrintTextLocations examples will
show you that and also the visual location of the glyphs in an image
rendering.
What you could also try is setParagraphStart("\n") and/or
setParagraphEnd("\n").
Tilman
Regards.
On Fri, Jul 29, 2016 at 4:30 PM, Gregor Kovač <[email protected]> wrote:
Hi!
API docs for PDFTextStripper (
http://pdfbox.apache.org/docs/2.0.2/javadocs/org/apache/pdfbox/text/PDFTextStripper.html
)
states that "This class will take a pdf document and strip out all of the
text and ignore the formatting and such". Please note that you can
call setAddMoreFormatting (
http://pdfbox.apache.org/docs/2.0.2/javadocs/org/apache/pdfbox/text/PDFTextStripper.html#setAddMoreFormatting(boolean)
)
with true and it will add a bit more formatting, but in my experience this
does not compare to using "pdftotext -layout" from Xpdf project. pdftotext
does a much better job preserving layout.
Best regards,
Kovi
2016-07-29 12:44 GMT+02:00 Shyam Sundar <[email protected]>:
Hi,
While converting a particular pdf to txt, spacing between lines and
paragraphs is not retained, output is just a flat text.
Sample file : ftp://PfXxyEhxh:[email protected]/Sample.zip
Looks like a file specific issue. Can you pls check ?
Thanks.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
--
-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~
| In A World Without Fences Who Needs Gates? |
| Experience Linux. |
-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]