UN_TOKENIZED));
/*
* Add title
*/
doc.add(new Field("kcmititle",
title,
Field.Store.YES,
Field.Index.UN_TOKENIZED));
/*
* return the document
*/
return doc;
}
}
- Original Message -
From: "Wayne Graham" <[EMAIL PROTECTED]>
To:
Sent: Fr
me"
> <[EMAIL PROTECTED]>
> To:
> Sent: Friday, June 08, 2007 12:48 PM
> Subject: Re: Indexing MSword Documents
>
>
> Why don't use Document?
> http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/
> org/apache/lucene/document/Document.htm
taking the time to reply
jim s
- Original Message -
From: "Mathieu Lecarme" <[EMAIL PROTECTED]>
To:
Sent: Friday, June 08, 2007 12:48 PM
Subject: Re: Indexing MSword Documents
Why don't use Document?
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightl
many thanks I will try that, thanks again!
jim s
- Original Message -
From: "Donna L Gresh" <[EMAIL PROTECTED]>
To:
Sent: Friday, June 08, 2007 12:52 PM
Subject: Re: Indexing MSword Documents
I do this exact thing. "text" (the second input to the Field co
I do this exact thing. "text" (the second input to the Field constructor)
is MSWord text that I've extracted from the Word document
textField = new org.apache.lucene.document.Field(textFieldName,text,
org.apache.lucene.document.Field.Store.NO,
org.apache.lucene.document.Field.Index.TOKENIZED);
Why don't use Document?
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/
org/apache/lucene/document/Document.html
HTMLDocument manage HTML stuff like encoding, header, and other
specificity.
Nutch use specific word tools (http://lucene.apache.org/nutch/apidocs/
org/ap
Hi,
I am trying to index msword documents. I've got things working but I do not
think I am doing things properly.
To index msword docs I use an extractor to extract the text. Then I write
the text to a .txt file and index that using an HTMLDocument object. Seems
to me that since I have the te