Am 29.09.2016 um 21:11 schrieb Harrington, Ferdinand B:
I found PDFText2HTML.java. Is there an example of how to call it?

Yes, see TestPDFText2HTML.java

I doubt that it can do indents.

Tilman

Outlook distorted my message. The data is indented like this
As bullets:

Abc
Def
      Xyz
      Ghi
           123
           456

Thank you.

-----Original Message-----
From: Tilman Hausherr [mailto:[email protected]]
Sent: Thursday, September 29, 2016 2:44 PM
To: [email protected]
Subject: Re: extract bullet points from a PDF

Am 29.09.2016 um 15:08 schrieb win harrington:
I would like to extract all the lists of bullet points from a PDF fileand put 
them into an xml format.
The items are indented. I wantthe text and the indentation level.
The input is like this:
     - abc
     - def

     - xyz
     - ghi

     - 123
     - 456


Can I convert that to:abc def   xyz   ghi      123      456
The last step will be toadd tags. I have code to do this:
<abc></abc><def></def>    <xyz></xyz>    <ghi></ghi>        <123></123>
          <456></456>
This sounds like an ordinary java question, i.e. parse some text. PDFBox
does have some rudimentary paragraph detection, I don't know if it
works. Try the PDFText2HTML tool in the source download.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


________________________________

This e-mail and any attachments are intended only for the use of the 
addressee(s) named herein and may contain proprietary information. If you are 
not the intended recipient of this e-mail or believe that you received this 
email in error, please take immediate action to notify the sender of the 
apparent error by reply e-mail; permanently delete the e-mail and any 
attachments from your computer; and do not disseminate, distribute, use, or 
copy this message and any attachments.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to