Thanks a lot, Adam. I haven't had any luck finding helpful info so far, but will hunker down to search the archives this week.
On Wed, Mar 16, 2011 at 12:51 PM, <[email protected]> wrote: > I know you can extract text based on a region, and I also remember seeing > many e-mails about improvements in preserving spacing in text extraction. > If you haven't already, search the mailing list archives and see if any of > those e-mails help you. I haven't done any text extraction myself, but I > hopefully someone else on the list will be able to point you in the right > direction. > > ---- > Thanks, > Adam > > > > > > From: > Kevin Brown <[email protected]> > To: > [email protected] > Date: > 03/16/2011 08:23 > Subject: > OFF TOPIC -- Extracting PDF tables by selecting them? > > > > Sorry, I understand pdfbox probably won't be able to do this.... but > perhaps > it can? :) > > We use this software from BCL called Jade that allowed you to select a > 'zone' on a PDF page and extract it to text in such a way that the spacing > and line breaking was preserved. It did (and does!) a better job of this > than any other tool we have ever tried. But they no longer make or support > it! Just wondering if any of you PDF mavens have found a tool or method > for > doing this which works really well? It seems impossible to do > programmatically unless you know the parameters of the text -- one needs > to > select it manually. For example, we use this a lot for odd tables. > > > > > > - FHA 203b; 203k; HECM; VA; USDA; Conventional > - Warehouse Lines; FHA-Authorized Originators > - Lending and Servicing in over 45 States > www.swmc.com - www.simplehecmcalculator.com > Visit www.swmc.com/resources for helpful links on Training, Webinars, > Lender Alerts and Submitting Conditions > > This email and any content within or attached hereto from Sun West Mortgage > Company, Inc. is confidential and/or legally privileged. The information is > intended only for the use of the individual or entity named on this email. > If you are not the intended recipient, you are hereby notified that any > disclosure, copying, distribution or taking any action in reliance on the > contents of this email information is strictly prohibited, and that the > documents should be returned to this office immediately by email. Receipt by > anyone other than the intended recipient is not a waiver of any privilege. > Please do not include your social security number, account number, or any > other personal or financial information in the content of the email. Should > you have any questions, please call (800) 453 7884.

