How to delete partial index

2006-12-12 Thread spinergywmy
Hi, I have ask this question before but may be the question wasn't clear. How can I delete particular index that I want to and keep the rest? For instance, I have been indexed document Id, date, user Id and contents, my question is does that particular contents will be deleted if I just

Re: Re: Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

2006-12-12 Thread Karl Koch
Hello Doron (and all the others who read here):), thank you for your effort and your time. I really appreciate it. :) I understand why normalisation is done in general. Mainly, to normalise the bias of oversized documents. In the literature I have read so far, there is usually a high effort on

Re: Lucene id generation

2006-12-12 Thread Waheed Mohammed
Thanks for the instant reply, I see what rajesh advises is something lilke what MultiReader does. That would be my last approach becouse of the complexities it will introduce in developing the business case I have. Any thing other than that would be a appriciable ppointer On Monday 11 December

Lucene scoring: Term frequency normalisation

2006-12-12 Thread Karl Koch
Hi, I have a question about the current Lucene scoring algoritm. In this scoring algorithm, the term frequency is calcualted by using the square root of the number of occuring terms as described in http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_tf

Lucene scoring: coord_q_d factor

2006-12-12 Thread Karl Koch
Hello group, The coord(q,d) normalisation is a score factor based on how many of the query terms are found in the specified document. and described here: http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord Does this have a theoretical base? On what

Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

2006-12-12 Thread Soeren Pekrul
Hello Karl, I’m very interested in the details of Lucene’s scoring as well. Karl Koch wrote: For this reason, I do not understand why Lucene (in version 1.2) normalises the query(!) with norm_q : sqrt(sum_t((tf_q*idf_t)^2)) which is also called cosine normalisation. This is a technique that

Complex query filtering

2006-12-12 Thread Maxim Patramanskij
I need to apply a set of custom filters to my query. One of the filters, which optionally can be applied, is a filter by date range. For the moment I'm using a BooleanQuery approach for this. I know that it is not the best from the score accuracy nor performance point of view and I want change

Indexing large files

2006-12-12 Thread abdul aleem
Hi There, I have been working with Lucene API for the past 1 day we are in the process of building a log viewer tool, this is how the log file looks [2006-12-11 01:52:40.179] [lon0571xus] [DEBUG] [TIE heartbeat monitor (monitor.heartbeat.fxstreamrates)] [unknown] [] [] ActiveRateServerIdList -

java requirements for lucene

2006-12-12 Thread Miles Efron
i have successfully compiled, installed, and run lucene-based applications on several machines, but i am currently trying to get lucene to run on a sever that i do not administer and am having an odd problem... perhaps someone can decipher it? if i try, for instance, to run the basic lucene

Re: Lucene scoring: Term frequency normalisation

2006-12-12 Thread Marvin Humphrey
On Dec 12, 2006, at 2:23 AM, Karl Koch wrote: However, what exactly is the advantage of using sqare root instead of log? Speaking anecdotally, I wouldn't say there's an advantage. There's a predictable effect: very long documents are rewarded, since the damping factor is not as strong.

lucene search

2006-12-12 Thread Bloem, E.J.W. van \(Erik, Student CS\)
Hi, I am building a portal where users are able to maintain a personal doc, placed in a database or dir on server. I want the users to be able to search all other users doc's for keywords. Like give me a top ten of documents containing the work bike. Is Lucene useful for this? Or do you

Re: lucene search

2006-12-12 Thread Erick Erickson
Well. searching documents for text is what Lucene is *made* for G. So, yes, this would be a fine thing to use Lucene for. You'll have to deal with coordinating between when a document is added to the directory and when it's added to the Lucene index. Also, give some thought to what form your

RE: search by field, not field value

2006-12-12 Thread Koji Sekiguchi
Erick, Sorry for replying to a bit old topic. TermDocs.seek(new Term(specific_field, )); Note that the as the value of the term gets all the terms. Then use TermDocs.next until it returns false. At each point, TermDocs.doc() will give you the Lucene ID of a document containing that term.

Re: Lucene scoring: coord_q_d factor

2006-12-12 Thread Steven Rowe
Karl Koch wrote: The coord(q,d) normalisation is a score factor based on how many of the query terms are found in the specified document. and described here: http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord Does this have a theoretical base? On

Re: search by field, not field value

2006-12-12 Thread Erick Erickson
Try this. It returns Found a term Austria in doc 7 Found a term Botswana in doc 6 Found a term New in doc 3 Found a term Tennessee, in doc 1 Found a term US in doc 0 Found a term US in doc 1 Found a term US in doc 3 Found a term US in doc 4 Found a term Utah, in doc 4 Found a term Virginia, in

Re: Indexing large files

2006-12-12 Thread Otis Gospodnetic
Hi, Yes, you can't get to the stored field content in your Hits because you are using a (File)Reader. Otis - Original Message From: abdul aleem [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Tuesday, December 12, 2006 8:22:54 AM Subject: Indexing large files Hi There, I

Re: How to delete partial index

2006-12-12 Thread Doron Cohen
spinergywmy [EMAIL PROTECTED] wrote: Hi, I have ask this question before but may be the question wasn't clear. How can I delete particular index that I want to and keep the rest? For instance, I have been indexed document Id, date, user Id and contents, my question is does that

Re: java requirements for lucene

2006-12-12 Thread Chris Hostetter
it appears that you may have multiple copies of hte lucene code base in your class path. : $ java org.apache.lucene.demo.IndexFiles ../data/medline/docs/ : Indexing to directory 'index'... : adding ../data/medline/docs/1.txt : Exception in thread main java.lang.IncompatibleClassChangeError:

Re: Lucene scoring: coord_q_d factor

2006-12-12 Thread Steven Rowe
Karl Koch wrote: Is there any other paper that actually shows the benefit of doing this particular normalisation with coord_q_d? I am not suggesting here that it is not useful, I am just looking for evidence how the idea developed. I think it's a mischaracterization to call coordination a

RE: de-boosting fields

2006-12-12 Thread Scott Smith
I've implemented the zero boost solution and it seems to be doing what I want. Thanks to everyone who had suggestions. -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Monday, December 11, 2006 11:45 AM To: java-user@lucene.apache.org Subject: Re: de-boosting

Re: How to delete partial index

2006-12-12 Thread spinergywmy
Hi, I m just wondering is there any unique key that I can use to delete particular document? How can I check the postion of a particular document inside index file? Is there any example that I can refer to on how to delete documents by a term. For second scenario, the reason why I m doing

Re: How to delete partial index

2006-12-12 Thread spinergywmy
Hi, When I perform delete document and delete document based on the Id, does the Id is the unique key and by deleting based on the Id, all the related info will be deleted as well? If so, how can I know the document Id? Thanks. regards, Wooi Meng -- View this message in context:

Re: How to delete partial index

2006-12-12 Thread Erick Erickson
you have to search against something known. You simply (as has been mentioned many times) cannot rely on the document IDs. So, I'd store the full path (untokenized) of the file. When you move a file, search for the path in the appropriate field in your index that the file was originally stored

Re: How to delete partial index

2006-12-12 Thread spinergywmy
Hi, I manage to delete the document based on term, but that is just 1 part. I wonder do lucene support how I can pull out the info that I have been indexed and place it into other index file. Is it the only way that I have to use indexwriter to perform indexing again with all the necessary

Re: Re: Re: Questions about Lucene scoring (was: Lucene 1.2 - scoring formula needed)

2006-12-12 Thread Doron Cohen
Karl Koch [EMAIL PROTECTED] wrote: For the documents Lucene employs its norm_d_t which is explained as: norm_d_t : square root of number of tokens in d in the same field as t Actually (by default) it is: 1 / sqrt(#tokens in d with same field as t) basically just the square root of the

Advice on 3NF Data Structures and Lucene Please

2006-12-12 Thread Andrew Hughes
Hey All, I am very interested in indexing a 3NF Data Structure. Is there any advice that someone can provide with this? From what I have seen Lucene is typically a flat First Normal Form (Flat) data structure The only way I can see to combine the relational links between multiple indexes