We parse electronic documents all the time here and we use Witango. If your
headings and footings are similar, for examples our are the same except for
page number, department number and date, you can find them easily, take any
information you want out of them (for example, our reports always prefix the
dept number with Dept:), and then remove them completely. For each line you
want, tokenize on CRLF giving you all lines for a page. After that,
transpose the array, go through each line in a rows statement and tokenize
on blanks to pull out specific columns. Since most reports are column
oriented, and most have the same data in the same columns on a page, it's
then easy to build a database from the extracted data.

Do this for every page and you have all of your report in a searchable
database. We've been doing this for over a year with an electronic report
from corporate and it works quite well.

-----Original Message-----
From: Chuck Lockwood [mailto:[EMAIL PROTECTED]
Sent: Friday, February 27, 2004 10:17 AM
To: [EMAIL PROTECTED]
Subject: RE: Witango-Talk: [OT] Extract data from report


I highly recommend a product called Monarch whenever you need to extract
data from text files.  It makes it quick and easy.  Conbine it with data
from other sources as well.

http://www.datawatch.com

________________________________________________________________________
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf

Reply via email to