Hi Mark,
If I follow you, I should list the key terms in my incoming document, then
select the queries which contains these key terms, and then run those queries
on my index ? If this is correct there is two things I don't understand:
-how do I know which term is a key term in my document ?
-how
Hi,
Is there a way to find the matched part of query string in the Hit object?
Lucene's Hilghlighter module does part of the job, highlighting the matched
word in the result document, however it doesn't give the effective keyword
in query string.
For example, suppose I have a query: "lorem OR elit
Hi Michael,
if I understand your questions correctly - feels like I must have missed
something - here is what can do to achieve what you want:
index these fields:
to
from
content
subject
all (includes text from all the above 4 fields)
and use "all" as your default search field. Then when you
Xiong,
You have made an excellent point!
It's a choice determined by how you use Sort,
if you need most suitable results pass in:
SortField.FIELD_SCORE
first...
Otherwise, generate all your scores and convert them
to sortable Strings at index time on your "votes" field.
Then, use this for se
I am trying to link the nutch index and the index generated from my database
using Lucene. So at the time of indexing my database, I want to pull the
indexes in from nutch and link the content from the url in the database and
the url that nutch hit. Can anyone tell me if they have done this and if
As the author of both Word POI and textmining.org, I recommend using
textmining.org. POI is for general purpose manipulation of Word
documents. textmining's only purpose is extracting text.
Also, people recommend using POI for text extraction but the only
place I've seen an actual how-to on this
Can anyone make a comparison between the two, namely POI API and the one
from textmining.org?
On 3/24/07, Ryan Ackley <[EMAIL PROTECTED]> wrote:
The site is down but you can download the word extractor library direct
here:
http://www.textmining.org/textmining.zip
Going to fix the site this we
The site is down but you can download the word extractor library direct here:
http://www.textmining.org/textmining.zip
Going to fix the site this weekend.
On 3/24/07, Sami Siren <[EMAIL PROTECTED]> wrote:
Antony Bowesman wrote:
>> Are there other sollutions?
There's also antiword [1] which c
I would also suggest that contrib/benchmark in the source has a nice
framework for experimenting with different factors for mergeFactor
and maxBufferedDocs. It is quite easy to set it up for a new
collection (i.e. yours) and run experiments that alter these two values.
Below is a sample "a