Re: Strange output

2009-02-15 Thread Abu Abdulla
I'm using this for searching (extracted and not full code): IndexSearcher indexSearcher = new IndexSearcher("index"); final QueryParser queryParser = new QueryParser("Line", myAnalyzer); queryParser.setAllowLeadingWildcard(true); final Query query = queryParser.parse(searchText); final BitSet

the efficiency of creating indexes

2009-02-15 Thread 治江 王
As i know, the time effciency of creating index is non-linearity with the size of documents. For example, if the size of indexes is 1G, the time cost is 2 hours, If the size of indexes is 10G, the time cost may be 30 hours. Who can tell me what is the reason? Any tips will be appreciated.

Boost factor in MultiFieldQueryParser

2009-02-15 Thread mitu2009
Hi, Can I boost different fields in MultiFieldQueryParser with different factors? Also, what is the maximum boost factor value I can assign to a field? Thanks a ton! Ed -- View this message in context: http://www.nabble.com/Boost-factor-in-MultiFieldQueryParser-tp22031092p22031092.html Sent

Re: Upper limit on number of Fields

2009-02-15 Thread Mark Miller
In my experience, the main issue to be concerned about with tons of fields is norms. You'll likely have to turn them off for most of the fields unless you have plenty of RAM to burn. They are stored in byte arrays of size maxdoc for each field (eg non sparse). Other than that, I don't think the

Re: Fragment Highlighter Phrase?

2009-02-15 Thread Ian Vink
Thanks Mark, I got the latest Contrib bits for Highlighter.net (Jan 28/2008 Version 2.3.2) but it looks similar to the older 2.0.0 There is a QueryScroer only. Any ideas? (Really important to me :) Ian On Sat, Feb 14, 2009 at 11:56 PM, Mark Miller wrote: > Sorry, I wasn't specific enough. I

Re: search(Query query, HitCollector results)

2009-02-15 Thread Michael McCandless
Mark Miller wrote: Michael McCandless wrote: Mark Miller wrote: So HitCollector#collect(int doc, float score) is not called in a special (default) order and must order the docs itself by score if one needs the hits sorted by relevance? Presumably there is no score ordering to the h

Re: Upper limit on number of Fields

2009-02-15 Thread Karl Wettin
15 feb 2009 kl. 16.27 skrev Joel Halbert: Is there any practical limit on the number of fields that can be maintained on an index? My index looks something like this, 1 million documents. For each group of 1000 documents I might have 10 indexed fields. This would mean in total about 1 f

Re: search(Query query, HitCollector results)

2009-02-15 Thread Mark Miller
Michael McCandless wrote: Mark Miller wrote: So HitCollector#collect(int doc, float score) is not called in a special (default) order and must order the docs itself by score if one needs the hits sorted by relevance? Presumably there is no score ordering to the hit id's lucene delivers

Re: search(Query query, HitCollector results)

2009-02-15 Thread Michael McCandless
Mark Miller wrote: So HitCollector#collect(int doc, float score) is not called in a special (default) order and must order the docs itself by score if one needs the hits sorted by relevance? Presumably there is no score ordering to the hit id's lucene delivers to a HitCollector? i.e.

Re: search(Query query, HitCollector results)

2009-02-15 Thread Mark Miller
So HitCollector#collect(int doc, float score) is not called in a special (default) order and must order the docs itself by score if one needs the hits sorted by relevance? Presumably there is no score ordering to the hit id's lucene delivers to a HitCollector? i.e. they are delivered in th

RE: search(Query query, HitCollector results)

2009-02-15 Thread spring
> The HitCollector used will determine how things are ordered. > In 2.4, the > TopDocCollector will order by relevancy and the > TopFieldDocCollector can > order by > relevancy, index order, or by field. Lucene delivers the hit > ids to the > HitCollector and it can order as it pleases. So

Re: search(Query query, HitCollector results)

2009-02-15 Thread Joel Halbert
Presumably there is no score ordering to the hit id's lucene delivers to a HitCollector? i.e. they are delivered in the order they are found and score is neither ascending or descending i.e. the next score could be higher or lower that the previous one? -Original Message- From: Mark Miller

Re: search(Query query, HitCollector results)

2009-02-15 Thread Mark Miller
spr...@gmx.eu wrote: Hi, in what order does search(Query query, HitCollector results) return the results? By relevance? Thank you. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-

search(Query query, HitCollector results)

2009-02-15 Thread spring
Hi, in what order does search(Query query, HitCollector results) return the results? By relevance? Thank you. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lu

Re: Term precendence

2009-02-15 Thread Yonik Seeley
On Sun, Feb 15, 2009 at 10:50 AM, Joel Halbert wrote: > When constructing a query, using a series of terms e.g. > > Term1=X, Term2=Y etc... > > does it make sense, like in sql, to place to most restrictive term query > first? > > i.e. if I know that the query will be mainly constrained by the valu

Term precendence

2009-02-15 Thread Joel Halbert
When constructing a query, using a series of terms e.g. Term1=X, Term2=Y etc... does it make sense, like in sql, to place to most restrictive term query first? i.e. if I know that the query will be mainly constrained by the value of Term1, does having this as the first in the query make the exec

Upper limit on number of Fields

2009-02-15 Thread Joel Halbert
Hi, Is there any practical limit on the number of fields that can be maintained on an index? My index looks something like this, 1 million documents. For each group of 1000 documents I might have 10 indexed fields. This would mean in total about 1 fields. Am I going to run into any issues her

Re: Faceted search with OpenBitSet/SortedVIntList

2009-02-15 Thread Paul Elschot
Meanwhile the choice between SortedVIntList and OpenBitSet has been removed from the trunk (development version), that now uses OpenBitSet only: https://issues.apache.org/jira/browse/LUCENE-1296 In case there is preference to have SortedVIntList used in the next lucene version (i.e. in cases when

Re: Optimal Solution for Unique Field Values

2009-02-15 Thread Chris Lu
I think you would need to 1) collect all the matching IDs for Field2=x 2) loop through Field1, for each Term's doc, collect the term if the term doc is in the matching IDs from step 1. This should be the fastest approach, pretty similar to what you suggested. -- Chris Lu

Optimal Solution for Unique Field Values

2009-02-15 Thread Joel Halbert
Hi, I'm looking for an optimal solution for extracting unique field values. The rub is that I want to be able to perform this for a unique subset of documents...as per the example: I have an index with Field1 and Field2. I want "all unique values of Field1 where Field2=X". Other than actually p