Hi,
I have ask this question before but may be the question wasn't clear.
How can I delete particular index that I want to and keep the rest? For
instance, I have been indexed document Id, date, user Id and contents, my
question is does that particular contents will be deleted if I just
Hello Doron (and all the others who read here):),
thank you for your effort and your time. I really appreciate it. :)
I understand why normalisation is done in general. Mainly, to normalise the
bias of oversized documents. In the literature I have read so far, there is
usually a high effort on
Thanks for the instant reply,
I see what rajesh advises is something lilke what MultiReader does.
That would be my last approach becouse of the complexities it will introduce
in developing the business case I have.
Any thing other than that would be a appriciable ppointer
On Monday 11 December
Hi,
I have a question about the current Lucene scoring algoritm. In this scoring
algorithm, the term frequency is calcualted by using the square root of the
number of occuring terms as described in
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_tf
Hello group,
The coord(q,d) normalisation is a score factor based on how many of the query
terms are found in the specified document. and described here:
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord
Does this have a theoretical base? On what
Hello Karl,
I’m very interested in the details of Lucene’s scoring as well.
Karl Koch wrote:
For this reason, I do not understand why Lucene (in version 1.2) normalises the query(!) with
norm_q : sqrt(sum_t((tf_q*idf_t)^2))
which is also called cosine normalisation. This is a technique that
I need to apply a set of custom filters to my query.
One of the filters, which optionally can be applied, is a filter by
date range.
For the moment I'm using a BooleanQuery approach for this.
I know that it is not the best from the score accuracy nor performance
point of view and I want change
Hi There,
I have been working with Lucene API for the past 1 day
we are in the process of building a log viewer tool,
this is how the log file looks
[2006-12-11 01:52:40.179] [lon0571xus] [DEBUG] [TIE
heartbeat monitor (monitor.heartbeat.fxstreamrates)]
[unknown] [] [] ActiveRateServerIdList -
i have successfully compiled, installed, and run lucene-based applications
on several machines, but i am currently trying to get lucene to run on a
sever that i do not administer and am having an odd problem... perhaps
someone can decipher it?
if i try, for instance, to run the basic lucene
On Dec 12, 2006, at 2:23 AM, Karl Koch wrote:
However, what exactly is the advantage of using sqare root instead
of log?
Speaking anecdotally, I wouldn't say there's an advantage. There's a
predictable effect: very long documents are rewarded, since the
damping factor is not as strong.
Hi,
I am building a portal where users are able to maintain a personal doc, placed
in a database or dir on server.
I want the users to be able to search all other users doc's for keywords.
Like give me a top ten of documents containing the work bike.
Is Lucene useful for this? Or do you
Well. searching documents for text is what Lucene is *made* for G. So,
yes, this would be a fine thing to use Lucene for. You'll have to deal with
coordinating between when a document is added to the directory and when it's
added to the Lucene index.
Also, give some thought to what form your
Erick,
Sorry for replying to a bit old topic.
TermDocs.seek(new Term(specific_field, ));
Note that the as the value of the term gets all the terms. Then use
TermDocs.next until it returns false. At each point, TermDocs.doc() will
give you the Lucene ID of a document containing that term.
Karl Koch wrote:
The coord(q,d) normalisation is a score factor based on how many of
the query terms are found in the specified document. and described
here:
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord
Does this have a theoretical base? On
Try this. It returns
Found a term Austria in doc 7
Found a term Botswana in doc 6
Found a term New in doc 3
Found a term Tennessee, in doc 1
Found a term US in doc 0
Found a term US in doc 1
Found a term US in doc 3
Found a term US in doc 4
Found a term Utah, in doc 4
Found a term Virginia, in
Hi,
Yes, you can't get to the stored field content in your Hits because you are
using a (File)Reader.
Otis
- Original Message
From: abdul aleem [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Tuesday, December 12, 2006 8:22:54 AM
Subject: Indexing large files
Hi There,
I
spinergywmy [EMAIL PROTECTED] wrote:
Hi,
I have ask this question before but may be the question wasn't clear.
How can I delete particular index that I want to and keep the rest?
For
instance, I have been indexed document Id, date, user Id and contents, my
question is does that
it appears that you may have multiple copies of hte lucene code base in
your class path.
: $ java org.apache.lucene.demo.IndexFiles ../data/medline/docs/
: Indexing to directory 'index'...
: adding ../data/medline/docs/1.txt
: Exception in thread main java.lang.IncompatibleClassChangeError:
Karl Koch wrote:
Is there any other paper that actually shows the benefit of doing
this particular normalisation with coord_q_d? I am not suggesting
here that it is not useful, I am just looking for evidence how the
idea developed.
I think it's a mischaracterization to call coordination a
I've implemented the zero boost solution and it seems to be doing what I
want. Thanks to everyone who had suggestions.
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Monday, December 11, 2006 11:45 AM
To: java-user@lucene.apache.org
Subject: Re: de-boosting
Hi,
I m just wondering is there any unique key that I can use to delete
particular document? How can I check the postion of a particular document
inside index file? Is there any example that I can refer to on how to delete
documents by a term.
For second scenario, the reason why I m doing
Hi,
When I perform delete document and delete document based on the Id, does
the Id is the unique key and by deleting based on the Id, all the related
info will be deleted as well? If so, how can I know the document Id? Thanks.
regards,
Wooi Meng
--
View this message in context:
you have to search against something known. You simply (as has been
mentioned many times) cannot rely on the document IDs.
So, I'd store the full path (untokenized) of the file. When you move a file,
search for the path in the appropriate field in your index that the file was
originally stored
Hi,
I manage to delete the document based on term, but that is just 1 part. I
wonder do lucene support how I can pull out the info that I have been
indexed and place it into other index file. Is it the only way that I have
to use indexwriter to perform indexing again with all the necessary
Karl Koch [EMAIL PROTECTED] wrote:
For the documents Lucene employs
its norm_d_t which is explained as:
norm_d_t : square root of number of tokens in d in the same field as t
Actually (by default) it is:
1 / sqrt(#tokens in d with same field as t)
basically just the square root of the
Hey All,
I am very interested in indexing a 3NF Data Structure. Is there any
advice that someone can provide with this? From what I have seen Lucene
is typically a flat First Normal Form (Flat) data structure The
only way I can see to combine the relational links between multiple
indexes
26 matches
Mail list logo