> On 29 Sep 2016, at 12:11, Harrington, Ferdinand B > <[email protected]> wrote: > > I found PDFText2HTML.java. Is there an example of how to call it? > Outlook distorted my message. The data is indented like this > As bullets: > > Abc > Def > Xyz > Ghi > 123 > 456
Text in PDF is just placed using (x, y) coordinates, it’s not like HTML where there is markup which describes the nesting, e.g. <li>, <ul>. If you want to figure out the nesting from the placement, you’ll have to write some code which does that. — John > Thank you. > > -----Original Message----- > From: Tilman Hausherr [mailto:[email protected]] > Sent: Thursday, September 29, 2016 2:44 PM > To: [email protected] > Subject: Re: extract bullet points from a PDF > > Am 29.09.2016 um 15:08 schrieb win harrington: >> I would like to extract all the lists of bullet points from a PDF fileand >> put them into an xml format. >> The items are indented. I wantthe text and the indentation level. >> The input is like this: >> - abc >> - def >> >> - xyz >> - ghi >> >> - 123 >> - 456 >> >> >> Can I convert that to:abc def xyz ghi 123 456 >> The last step will be toadd tags. I have code to do this: >> <abc></abc><def></def> <xyz></xyz> <ghi></ghi> <123></123> >> <456></456> > > This sounds like an ordinary java question, i.e. parse some text. PDFBox > does have some rudimentary paragraph detection, I don't know if it > works. Try the PDFText2HTML tool in the source download. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > > ________________________________ > > This e-mail and any attachments are intended only for the use of the > addressee(s) named herein and may contain proprietary information. If you are > not the intended recipient of this e-mail or believe that you received this > email in error, please take immediate action to notify the sender of the > apparent error by reply e-mail; permanently delete the e-mail and any > attachments from your computer; and do not disseminate, distribute, use, or > copy this message and any attachments. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

