Parsing Word Docs

2003-03-05 Thread Eric Anderson
I'm interested in using the textmining/textextraction utilities using Apache POI, that Ryan was discussing. However, I'm having some difficulty determining what the insertion point would be to replace the default parser with the word parser. Any assistance would be appreciated. LanRx Netw

my experiences - Re: Parsing Word Docs

2003-03-05 Thread David Spencer
FYI I tried the textmining.org/poi combo and on a collection of 350 word docs people have developed here over the years, and it failed on 33% of them with exceptions being thrown about the formats being invalid. I tried "antiword" ( http://www.winfield.demon.nl/ ), a native & free *.exe, and it wo

Re: my experiences - Re: Parsing Word Docs

2003-03-05 Thread Eric Anderson
Ok. Thanks for the tip. I downloaded and compiled Antiword, and would like to now add it to my indexing class. However, I'm not sure how the application would be called, and from where it would be called. How will I have the class parse the document through Antiword to create the keyword index

Re: my experiences - Re: Parsing Word Docs

2003-03-06 Thread Ryan Ackley
Subject: my experiences - Re: Parsing Word Docs > FYI I tried the textmining.org/poi combo and on a collection of 350 word > docs people have developed here over the years, and it failed on 33% of them > with exceptions being thrown about the formats being invalid. > > I tried &q

Re: my experiences - Re: Parsing Word Docs

2003-03-06 Thread Ryan Ackley
uot;Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, March 05, 2003 7:14 PM Subject: Re: my experiences - Re: Parsing Word Docs > Ok. Thanks for the tip. > > I downloaded and compiled Antiword, and would like to now add it to my indexing > class. However, I'm not s

Re: my experiences - Re: Parsing Word Docs

2003-03-06 Thread Eric Anderson
gt; To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Wednesday, March 05, 2003 7:14 PM > Subject: Re: my experiences - Re: Parsing Word Docs > > > > Ok. Thanks for the tip. > > > > I downloaded and compiled Antiword, and would like to no

AW: my experiences - Re: Parsing Word Docs

2003-03-06 Thread Borkenhagen, Michael (ofd-ko zdfin)
rses this characters correctly). Do you have any hints for me ? Michael -Ursprüngliche Nachricht- Von: Ryan Ackley [mailto:[EMAIL PROTECTED] Gesendet: Donnerstag, 6. März 2003 13:13 An: Lucene Users List Betreff: Re: my experiences - Re: Parsing Word Docs David, The textmining.org

Re: my experiences - Re: Parsing Word Docs

2003-03-06 Thread Mario Ivankovits
--- Original Message - From: "Borkenhagen, Michael (ofd-ko zdfin)" <[EMAIL PROTECTED]> To: "'Lucene Users List'" <[EMAIL PROTECTED]> Sent: Thursday, March 06, 2003 1:39 PM Subject: AW: my experiences - Re: Parsing Word Docs Ryan, I tried to use texmining

AW: my experiences - Re: Parsing Word Docs

2003-03-06 Thread Borkenhagen, Michael (ofd-ko zdfin)
thx a lot :) I'll try it -Ursprüngliche Nachricht- Von: Mario Ivankovits [mailto:[EMAIL PROTECTED] Gesendet: Donnerstag, 6. März 2003 14:00 An: Lucene Users List Betreff: Re: my experiences - Re: Parsing Word Docs The problems with german umlauts should be fixed. I have posted t

Re: my experiences - Re: Parsing Word Docs

2003-03-06 Thread David Spencer
Eric Anderson wrote: Ok. Thanks for the tip. I downloaded and compiled Antiword, and would like to now add it to my indexing class. However, I'm not sure how the application would be called, How? You exec passing the file name and it prints the ascii text to stdout. This method takes the file

Re: my experiences - Re: Parsing Word Docs

2003-03-06 Thread David Spencer
AIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, March 05, 2003 7:14 PM Subject: Re: my experiences - Re: Parsing Word Docs Ok. Thanks for the tip. I downloaded and compiled Antiword, and would like to now add it to my indexing class. However, I

Re: my experiences - Re: Parsing Word Docs

2003-03-06 Thread David Spencer
AIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, March 05, 2003 6:24 PM Subject: my experiences - Re: Parsing Word Docs FYI I tried the textmining.org/poi combo and on a collection of 350 word docs people have developed here over the years, and it failed o