RE: Reverse search

2007-03-24 Thread Melanie Langlois
Hi Mark, If I follow you, I should list the key terms in my incoming document, then select the queries which contains these key terms, and then run those queries on my index ? If this is correct there is two things I don't understand: -how do I know which term is a key term in my document ? -how

Matched Query Part in Hit Object

2007-03-24 Thread Mohsen Saboorian
Hi, Is there a way to find the matched part of query string in the Hit object? Lucene's Hilghlighter module does part of the job, highlighting the matched word in the result document, however it doesn't give the effective keyword in query string. For example, suppose I have a query: "lorem OR elit

Re: Search Design Question

2007-03-24 Thread Xiaocheng Luan
Hi Michael, if I understand your questions correctly - feels like I must have missed something - here is what can do to achieve what you want: index these fields: to from content subject all (includes text from all the above 4 fields) and use "all" as your default search field. Then when you

Re: How to customize scoring using user feedback?

2007-03-24 Thread Peter W.
Xiong, You have made an excellent point! It's a choice determined by how you use Sort, if you need most suitable results pass in: SortField.FIELD_SCORE first... Otherwise, generate all your scores and convert them to sortable Strings at index time on your "votes" field. Then, use this for se

Linking two different indexes

2007-03-24 Thread Yakn
I am trying to link the nutch index and the index generated from my database using Lucene. So at the time of indexing my database, I want to pull the indexes in from nutch and link the content from the url in the database and the url that nutch hit. Can anyone tell me if they have done this and if

Re: index word files ( doc )

2007-03-24 Thread Ryan Ackley
As the author of both Word POI and textmining.org, I recommend using textmining.org. POI is for general purpose manipulation of Word documents. textmining's only purpose is extracting text. Also, people recommend using POI for text extraction but the only place I've seen an actual how-to on this

Re: index word files ( doc )

2007-03-24 Thread jafarim
Can anyone make a comparison between the two, namely POI API and the one from textmining.org? On 3/24/07, Ryan Ackley <[EMAIL PROTECTED]> wrote: The site is down but you can download the word extractor library direct here: http://www.textmining.org/textmining.zip Going to fix the site this we

Re: index word files ( doc )

2007-03-24 Thread Ryan Ackley
The site is down but you can download the word extractor library direct here: http://www.textmining.org/textmining.zip Going to fix the site this weekend. On 3/24/07, Sami Siren <[EMAIL PROTECTED]> wrote: Antony Bowesman wrote: >> Are there other sollutions? There's also antiword [1] which c

Re: MergeFactor and MaxBufferedDocs value should ...?

2007-03-24 Thread Grant Ingersoll
I would also suggest that contrib/benchmark in the source has a nice framework for experimenting with different factors for mergeFactor and maxBufferedDocs. It is quite easy to set it up for a new collection (i.e. yours) and run experiments that alter these two values. Below is a sample "a