Re: sorting by per doc hit count

2006-12-19 Thread Mark Miller
Could I use another Similarity that returned 1 for most of the scoring terms and the actual term frequency (rig the equation)? Could I then alternate the DefaultSimilarity and HitsPerDocSimilairty per search? LIA mentioned something about needing to rebuild the index if you change Similarity's.

Re: sorting by per doc hit count

2006-12-19 Thread Doron Cohen
Mark Miller [EMAIL PROTECTED] wrote on 19/12/2006 09:21:00: LIA mentioned something about needing to rebuild the index if you change Similarity's. That does not make sense to me yet. It would seem you could alternate them. What does scoring have to do with indexing? For this part of your

Re: Lucene id generation

2006-12-19 Thread Steven Rowe
Antonio Bruno wrote: To use but directly the docId would render efficient and fastest the searches much. Thoughts to the possibility of being able to apply a first CachingWrapperFilter F1 on an index and a second CachingWrapperFilter F2 on an other index and after to make (F1 AND F2) and to

Re: Lucene 2.0.1 release date

2006-12-19 Thread Doug Cutting
Steven Rowe wrote: 2.1 is much more likely to be the label used for the next release than 2.0.1. The roadmap in Jira shows 21 issues scheduled for 2.0.1. If there is in fact no intent to merge these into the 2.0 branch, these should probably be retargetted for 2.1.0, and the 2.0.1 version

how to define deault fields

2006-12-19 Thread John Song
Hi: How to define default fields? Is it done during index time or during search time? Strangely, I can't find out any information on how default fields are defined? thanks, john __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best

Re: Lucene scoring: coord_q_d factor

2006-12-19 Thread Doug Cutting
Karl Koch wrote: Are there any other papers that regard the combination of coordination level matching and TFxIDF as advantageous? We independently developed coordination-level matching combined with TFxIDF when I worked at Apple. This is documented in:

Re: sorting by per doc hit count

2006-12-19 Thread Mark Miller
Thanks for the tip Doron, What if I replace the decode static method in Similiarity so that it returns 1 always for the HitPerDocSimiliarity? This would not require a re-index right? Doron Cohen wrote: Mark Miller [EMAIL PROTECTED] wrote on 19/12/2006 09:21:00: LIA mentioned something

Re: I: Lucene id generation

2006-12-19 Thread Erick Erickson
I see your point, but I have to ask whether this is a practical or a theoretical problem? If it's a practical one, perhaps you'd be willing to talk about the issue you're actually trying to solve and maybe we can come up with a solution within the current framework. I know others on the list have

Re: sorting by per doc hit count

2006-12-19 Thread Mark Miller
Foolish me...override a static method...silly silly. Still, I think there must be some way. I don't care about the field normalization...there must be some way to make it return a constant 1 when using a new Similarity class. Doron Cohen wrote: Mark Miller [EMAIL PROTECTED] wrote on

Re: how to define deault fields

2006-12-19 Thread Yonik Seeley
On 12/19/06, John Song [EMAIL PROTECTED] wrote: How to define default fields? Is it done during index time or during search time? Strangely, I can't find out any information on how default fields are defined? default field is simply a QueryParser concept (see it's constructors). It does

MultiFieldQueryParser doesn't properly filter out documents when the query string specifies to exclude certain terms

2006-12-19 Thread Scott Sellman
I am not sure if this is a problem with Lucene or if I am building my Query object improperly. It seems to me, when performing a search that should exclude certain terms, MultiFieldQueryParser doesn't filter out documents when it should. Consider the following example to clarify what I am

Re: sorting by per doc hit count

2006-12-19 Thread Chris Hostetter
: Foolish me...override a static method...silly silly. Still, I think : there must be some way. I don't care about the field : normalization...there must be some way to make it return a constant 1 : when using a new Similarity class. as discussed: norms are a value explicitly stored in your

Re: MultiFieldQueryParser doesn't properly filter out documents when the query string specifies to exclude certain terms

2006-12-19 Thread Daniel Naber
On Tuesday 19 December 2006 23:05, Scott Sellman wrote:                         new BooleanClause.Occur[]{BooleanClause.Occur.SHOULD, BooleanClause.Occur.SHOULD} Why do you explicitly specify these operators? q.add(keywordQuery, BooleanClause.Occur.MUST); //true, false); You seem to wrap a

Re: Extracting data from Lucene index files

2006-12-19 Thread Venkateshprasanna
Take a look at TermDocs and TermEnum. I need to get the frequency of each word in each of the documents I have indexed. This is what I could do with TermEnums and TermDocs. For each Term from TermEnum, I have instantiated a TermsDoc and for each doc, I am trying to get the frequency of the

Re: sorting by per doc hit count

2006-12-19 Thread Mark Miller
I appreciate your help Hoss. That has cleared up some things for me. The problem reamins that I would like to be able to switch between the hits per doc Similarity and the default Similarity on any given search. I was hoping that I could index with DefaultSimilarity and store the norms for

lucene nightly build after 11/20

2006-12-19 Thread Yonik Seeley
Anyone using a lucene nightly build dated later than 11/20 will want to upgrade to the next (future) nightly build that will be dated 12/21 http://issues.apache.org/jira/browse/LUCENE-754 Keep in mind that nightly builds are developer builds and not always stable (though we try our best) :-)