Thanks Kovi for quick response.

Well why does it fail only for a particular file, a replica of same file
generated using another pdf library works perfectly fine with
PDFTextStripper ... isn't it strange and look like a bug ?

I hope you checked shared Sample.zip, it has both working & non-working
files.

Regards.

On Fri, Jul 29, 2016 at 4:30 PM, Gregor Kovač <[email protected]> wrote:

> Hi!
>
> API docs for PDFTextStripper (
>
> http://pdfbox.apache.org/docs/2.0.2/javadocs/org/apache/pdfbox/text/PDFTextStripper.html
> )
> states that "This class will take a pdf document and strip out all of the
> text and ignore the formatting and such". Please note that you can
> call setAddMoreFormatting (
>
> http://pdfbox.apache.org/docs/2.0.2/javadocs/org/apache/pdfbox/text/PDFTextStripper.html#setAddMoreFormatting(boolean)
> )
> with true and it will add a bit more formatting, but in my experience this
> does not compare to using "pdftotext -layout" from Xpdf project. pdftotext
> does a much better job preserving layout.
>
> Best regards,
>     Kovi
>
> 2016-07-29 12:44 GMT+02:00 Shyam Sundar <[email protected]>:
>
> > Hi,
> >
> > While converting a particular pdf to txt, spacing between lines and
> > paragraphs is not retained, output is just a flat text.
> >
> > Sample file : ftp://PfXxyEhxh:[email protected]/Sample.zip
> >
> > Looks like a file specific issue. Can you pls check ?
> >
> > Thanks.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
>
>
> --
> -~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~
> |  In A World Without Fences Who Needs Gates?  |
> |              Experience Linux.               |
> -~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~
>

Reply via email to