Re: Heuristically processing documents

2009-03-20 Thread Hendrik van Rooyen
"MRAB" wrote: BJörn Lindqvist wrote: 8< --- >> For example, to find the email you can use a simple regexp. If there >> is a match you can be certain that that is the authors email. But what >> algorithms can you use to figure out the other information? >> >Tricky! :-) > >

Re: Heuristically processing documents

2009-03-19 Thread MRAB
BJörn Lindqvist wrote: I have a large set of documents in various text formats. I know that each document contains its authors name, email and phone number. Sometimes it also contains the authors home address. The task is to find out the name, email and phone of as many documents as possible. Si

Heuristically processing documents

2009-03-19 Thread BJörn Lindqvist
I have a large set of documents in various text formats. I know that each document contains its authors name, email and phone number. Sometimes it also contains the authors home address. The task is to find out the name, email and phone of as many documents as possible. Since the documents are not