> On 29 Sep 2016, at 12:11, Harrington, Ferdinand B 
> <[email protected]> wrote:
> 
> I found PDFText2HTML.java. Is there an example of how to call it?
> Outlook distorted my message. The data is indented like this
> As bullets:
> 
> Abc
> Def
>     Xyz
>     Ghi
>          123
>          456

Text in PDF is just placed using (x, y) coordinates, it’s not like HTML where 
there is
markup which describes the nesting, e.g. <li>, <ul>.

If you want to figure out the nesting from the placement, you’ll have to write 
some
code which does that.

— John

> Thank you.
> 
> -----Original Message-----
> From: Tilman Hausherr [mailto:[email protected]]
> Sent: Thursday, September 29, 2016 2:44 PM
> To: [email protected]
> Subject: Re: extract bullet points from a PDF
> 
> Am 29.09.2016 um 15:08 schrieb win harrington:
>> I would like to extract all the lists of bullet points from a PDF fileand 
>> put them into an xml format.
>> The items are indented. I wantthe text and the indentation level.
>> The input is like this:
>>    - abc
>>    - def
>> 
>>    - xyz
>>    - ghi
>> 
>>    - 123
>>    - 456
>> 
>> 
>> Can I convert that to:abc def   xyz   ghi      123      456
>> The last step will be toadd tags. I have code to do this:
>> <abc></abc><def></def>    <xyz></xyz>    <ghi></ghi>        <123></123>
>>         <456></456>
> 
> This sounds like an ordinary java question, i.e. parse some text. PDFBox
> does have some rudimentary paragraph detection, I don't know if it
> works. Try the PDFText2HTML tool in the source download.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
> 
> ________________________________
> 
> This e-mail and any attachments are intended only for the use of the 
> addressee(s) named herein and may contain proprietary information. If you are 
> not the intended recipient of this e-mail or believe that you received this 
> email in error, please take immediate action to notify the sender of the 
> apparent error by reply e-mail; permanently delete the e-mail and any 
> attachments from your computer; and do not disseminate, distribute, use, or 
> copy this message and any attachments.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to