Re: TopDocCollector vs Hits inquiry

2009-02-05 Thread Jay Malaluan
Hi, Thanks for pointing me to the API. I found the explanation I'm looking for at: http://lucene.apache.org/java/2_4_0/api/core/index.html?org/apache/lucene/search/Hits.html There's an example on how to use the TopDocCollector instead of Hits. Regards, Jay Joel Malaluan Grant Ingersoll-6 w

Re: TermQuery search returns the same Document several times

2009-02-05 Thread Erick Erickson
Your coworker *might* have been talking about a Hits object when iterating over it for documents past the 100th or so. See the discussion list of the wiki for the messy details. Well, you can always sort by a field rather than by score, see SortField and associated. And you can always specify seco

Re: Poor QPS with highlighting

2009-02-05 Thread Jason Rutherglen
http://en.wikipedia.org/wiki/Google_platform Document server summarization On Thu, Feb 5, 2009 at 12:57 PM, Michael Stoppelman wrote: > On Thu, Feb 5, 2009 at 12:47 PM, Michael Stoppelman >wrote: > > > > > > > On Thu, Feb 5, 2009 at 9:05 AM, Jason Rutherglen < > > jason.rutherg...@gmail.com> wr

Re: Poor QPS with highlighting

2009-02-05 Thread Michael Stoppelman
On Thu, Feb 5, 2009 at 12:47 PM, Michael Stoppelman wrote: > > > On Thu, Feb 5, 2009 at 9:05 AM, Jason Rutherglen < > jason.rutherg...@gmail.com> wrote: > >> Google uses dedicated highlighting servers. Maybe this architecture would >> work for you. >> > > What's your reference? I used to work at

Re: Poor QPS with highlighting

2009-02-05 Thread Michael Stoppelman
On Thu, Feb 5, 2009 at 9:05 AM, Jason Rutherglen wrote: > Google uses dedicated highlighting servers. Maybe this architecture would > work for you. > What's your reference? I used to work at Google. > > On Mon, Feb 2, 2009 at 11:24 PM, Michael Stoppelman >wrote: > > > Hi all, > > > > My sear

Re: TermQuery search returns the same Document several times

2009-02-05 Thread Lebiram
Sorry, I might have misunderstood what my coworker told me. If HitCollector only returns a document once then he might be referring to an application ID that is assigned to a field that has been indexed twice or more with different document IDs. I'll clarify this with him. However is there a

Re: Poor QPS with highlighting

2009-02-05 Thread Jason Rutherglen
Google uses dedicated highlighting servers. Maybe this architecture would work for you. On Mon, Feb 2, 2009 at 11:24 PM, Michael Stoppelman wrote: > Hi all, > > My search backends are only able to eek out 13-15 qps even with the entire > index in memory (this makes it very expensive to scale). A

termDocs / termEnums performance increase for 2.4.0

2009-02-05 Thread Beard, Brian
Thought I would report a performance increase noticed in migrating from 2.3.2 to 2.4.0. Performing an iterated loop using termDocs & termEnums like below is about 30% faster. The example test set I'm running has about 70K documents to go through and process (on a dual processor windows machine) w

RE: Poor QPS with highlighting

2009-02-05 Thread Beard, Brian
A while ago someone posted a link to a project called XTF which does this: http://xtf.wiki.sourceforge.net/ The one problem with this approach still lurking for me (or maybe I don't understand how to get around) is how to handle multiple terms which "must" appear in the query, but are in non-overl

Re: TermQuery search returns the same Document several times

2009-02-05 Thread Erick Erickson
I don't understand your question. From the API docs for HitCollector.collect: <<>> Can you ask your question another way? Because the only answer I can come up with is "HitCollector.collect only sees each document once by definition". Best Erick On Thu, Feb 5, 2009 at 7:17 AM, Lebiram wrote:

TermQuery search returns the same Document several times

2009-02-05 Thread Lebiram
Hi All, Is it possible to somehow ensure that a document will be returned only once when collecting from HitCollector?

Re: TopDocCollector vs Hits inquiry

2009-02-05 Thread Grant Ingersoll
http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/search/Searcher.html#search(org.apache.lucene.search.Query,%20org.apache.lucene.search.HitCollector) The TopDocCollector is a HitCollector. On Feb 4, 2009, at 10:34 PM, Jay Malaluan wrote: Hi, As I was reading the post "Re: TopDo

Re: Field.Store.YES Question

2009-02-05 Thread Amin Mohammed-Coleman
Thanks guys for your replies! It's helped alot! Cheers Amin On Thu, Feb 5, 2009 at 9:28 AM, Ganesh wrote: > Field.Store.Yes is to store the field data as it is, so that it could be > retrieved to display results. > Field.Index.ANALYZED, tokenizes the field and stores the tokenized content. > >

Re: Field.Store.YES Question

2009-02-05 Thread Ganesh
Field.Store.Yes is to store the field data as it is, so that it could be retrieved to display results. Field.Index.ANALYZED, tokenizes the field and stores the tokenized content. Regards Ganesh - Original Message - From: "Amin Mohammed-Coleman" To: Sent: Thursday, February 05, 2009

Re: Field.Store.YES Question

2009-02-05 Thread Karl Wettin
5 feb 2009 kl. 09.30 skrev Amin Mohammed-Coleman: Is there a seperate part in the lucene document that the tokenised strings are stored and therefore Lucene knows where to look? Yes. Stored fields is meta data bound to a document, for instance the primary key of the object the Lucene do

Field.Store.YES Question

2009-02-05 Thread Amin Mohammed-Coleman
Hi I'm probably going to get shot down for asking this simple question. Although I think I understand the basic concept of Field I feel there is something that I am missing and I was wondering if someone might help to clarify. You can store a field value in an index using Field.Store.YES or if th