Re: Scoring filters

2008-06-11 Thread Paul Elschot
Op Wednesday 11 June 2008 01:41:38 schreef Karl Wettin: > Each of my filters represent single boosting term queries. But when > using the filter instead o the boosting term query I loose the score > (not sure this is true) and payload boost (if any), both essential > for the quality of my results.

RE: The performance of lucene searching(web entironment) test

2008-06-11 Thread Toke Eskildsen
On Wed, 2008-06-11 at 00:17 +0800, lutan wrote: > In my test case , I start loadrunner jsut test for 5 minute,and the response > growth slowly.the TPS(transactions per second) seems stoped at 10 finally. That's without reusing the searcher, right? In that case the increased rate must be attribute

AW: retrieve all docs efficiently - just one field

2008-06-11 Thread Johannes Christen
That might be a solution in this case, but I have the same kind of problem in another case. We index documents from an NTFS source. One field is the URI of the document. After a query has been processed, we perform an access check on the hits to ensure the user has access rights to open the docu

Re: FileNotFoundException in ConcurrentMergeScheduler

2008-06-11 Thread Michael McCandless
Grant Ingersoll wrote: Is more than one thread adding documents to the index? I don't believe so, but I am trying to reproduce. I've only seen it once, and don't have a lot of details, other than I noticed it was on a specific file (.fdt) and was wondering if that was a factor or not.

RE: The performance of lucene searching(web entironment) test

2008-06-11 Thread lutan
Thanks for you replay!> Date: Wed, 11 Jun 2008 09:19:46 +0200> From: [EMAIL PROTECTED]> Subject: RE: The performance of lucene searching(web entironment) test> To: java-user@lucene.apache.org> > On Wed, 2008-06-11 at 00:17 +0800, lutan wrote:> > In my test case , I start loadrunner jsut test fo

Re: FileNotFoundException in ConcurrentMergeScheduler

2008-06-11 Thread Grant Ingersoll
On Jun 11, 2008, at 6:00 AM, Michael McCandless wrote: Grant Ingersoll wrote: Is more than one thread adding documents to the index? I don't believe so, but I am trying to reproduce. I've only seen it once, and don't have a lot of details, other than I noticed it was on a specific fil

Is it possible to get only one Field from a Document?

2008-06-11 Thread Marcelo Schneider
I have a environment where we have indexed a DB with about 6mil entries with Lucene, and each row has 25 columns. 20 cols have integer codes used as filters (indexed/unstored), and the other 5 have (very) large texts (also indexed/unstored). Currently the search I'm doing is like this: Hits hi

Re: Running Lucene in a Clustered Environment

2008-06-11 Thread Kalani Ruwanpathirana
Hi Shalin, I am not familiar with Solr. I just know that it is a search server. Can you please point me to some resources on how can I use Solr to solve the situation? Kalani On Tue, Jun 10, 2008 at 5:03 PM, Shalin Shekhar Mangar < [EMAIL PROTECTED]> wrote: > Hi Kalani, > > Are you aware of Ap

RE: Is it possible to get only one Field from a Document?

2008-06-11 Thread Daan de Wit
This is possible, you need to provider a FieldSelector to IndexReader#document(docId, selector). This won't work with Hits though, because Hits does not expose the document number, so you need to roll your own solution using TopDocs or HitCollector, for information see the discussion in this is

RE: Is it possible to get only one Field from a Document?

2008-06-11 Thread Daan de Wit
But I doubt this will solve your memory issue because nonstored fields are not read when retrieving the document. -Original Message- From: Daan de Wit [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 11, 2008 13:44 To: java-user@lucene.apache.org Subject: RE: Is it possible to get only on

Re: Is it possible to get only one Field from a Document?

2008-06-11 Thread Grant Ingersoll
For the record, Hits.id(int i) returns the document number. Note, though, that Hits is now deprecated, as pointed out by the link to 1290, so going the TopDocs route is probably better anyway. -Grant On Jun 11, 2008, at 7:43 AM, Daan de Wit wrote: This is possible, you need to provider a F

Re: Is it possible to get only one Field from a Document?

2008-06-11 Thread Marcelo Schneider
Daan de Wit escreveu: But I doubt this will solve your memory issue because nonstored fields are not read when retrieving the document. Thanks for the fast reply Daan! Just for clearance, if I had all the code fields (filters) stored, then it would make any difference? -Original Mes

RE: Is it possible to get only one Field from a Document?

2008-06-11 Thread Daan de Wit
Yep, using a FieldSelector you can restrict the fields that will be loaded, you can also specify how fields should be loaded (normal, lazy or load the field, and then stop loading the document, i.e. skip other fields). -Original Message- From: Marcelo Schneider [mailto:[EMAIL PROTECTED]

Re: retrieve all docs efficiently - just one field

2008-06-11 Thread Karl Wettin
11 jun 2008 kl. 09.38 skrev Johannes Christen: That might be a solution in this case, but I have the same kind of problem in another case. We index documents from an NTFS source. One field is the URI of the document. After a query has been processed, we perform an access check on the hits

Re: retrieve all docs efficiently - just one field

2008-06-11 Thread 1world1love
karl wettin-3 wrote: > > > I might be missing something here -- can't you just add the age field > to the index and include that in your query? > > Thanks for the response Karl: I just used the age field as an example, but in reality the structured data is copious and complex relationshi

Re: Scoring filters

2008-06-11 Thread Karl Wettin
11 jun 2008 kl. 09.14 skrev Paul Elschot: Op Wednesday 11 June 2008 01:41:38 schreef Karl Wettin: Each of my filters represent single boosting term queries. But when using the filter instead o the boosting term query I loose the score (not sure this is true) and payload boost (if any), both ess

fieldNorm and fieldValueUniqueness

2008-06-11 Thread Cam Bazz
Hello, When you look at the fields of a document with Luke, there is a norm column. I have not been able to figure out what that is. The reason I am asking is that I am trying to build a uniqueness model. My Index is structured as follows: classID, textID, K, V classID is a given class. textID

Re: retrieve all docs efficiently - just one field

2008-06-11 Thread Erick Erickson
<<>> I infer from this that you're using a Hits object to get your IDs to insert in your temporary table. Here's the problem with Hits... It re-executes the query every 100 (200?) hits. So you can think of it as while (more hits) { if ((count % 100) == 0) execute the search and throw away the

Re: retrieve all docs efficiently - just one field

2008-06-11 Thread 1world1love
Thanks Erick. That is what I was assuming but couldn't confirm if it was worth going down those paths to acheive what I was hoping. Your essay was very informative about realistic expectations with the fieldselector. I actually just got through reading the discussion on deprecating hits which ess

Re: fieldNorm and fieldValueUniqueness

2008-06-11 Thread Karl Wettin
11 jun 2008 kl. 16.04 skrev Cam Bazz: When you look at the fields of a document with Luke, there is a norm column. I have not been able to figure out what that is. Norms is the 8 bit discretization of length normalization and field boost combined. See IndexReader#norms, Similarity#leng

RE: Is it possible to get only one Field from a Document?

2008-06-11 Thread Alex
if you have many terms across the fields, you might want to invoke IndexReader's setTermInfosIndexDivisor() method, which would reduce the in memory term infos used to lookup idf, but a (slightly) slower search. > From: [EMAIL PROTECTED] > To: java-user@lucene.apache.org > Subject: Re: Is it po

Re: fieldNorm and fieldValueUniqueness

2008-06-11 Thread Cam Bazz
yes, figured it out. thanks. how about checking for uniqueness? Best. On Wed, Jun 11, 2008 at 5:39 PM, Karl Wettin <[EMAIL PROTECTED]> wrote: > > 11 jun 2008 kl. 16.04 skrev Cam Bazz: > >> >> When you look at the fields of a document with Luke, there is a norm >> column. >> I have not been able

Keyword expansion

2008-06-11 Thread Sengly Heng
Dear all, To improve the search, I will have to do keyword expansion. I am looking for a library that would help me to get the list of synonym of a term with some similarity score. Is there any lib package that can handle this? It would be great if it is in Python. I have searched the web and foun

Concurrent query benchmarks, with 1,2,4,8 readers

2008-06-11 Thread Glen Newton
I have extended my evaluation (previous evaluation: http://zzzoot.blogspot.com/2008/06/simultaneous-threaded-query-lucene.html) to include as well as an increasing # of threads performing concurrent queries, 1,2,4 and 8 IndexReaders. The results can be found here: http://zzzoot.blogspot.com/2008/0

Re: Concurrent query benchmarks, with 1,2,4,8 readers

2008-06-11 Thread Otis Gospodnetic
Hi Glen, Aha, good to see the benefit of multiple IndexReaders/Searchers so clearly. Makes me think we'll want to add a config setting for this in Solr... :) As for why 4 is the best choice, I think it's because of those 4 cores that you've got. My guess is that you'll see slightly better per

Re: Concurrent query benchmarks, with 1,2,4,8 readers

2008-06-11 Thread Glen Newton
Hi Otis, Thanks for the feedback. 2008/6/11 Otis Gospodnetic <[EMAIL PROTECTED]>: > Hi Glen, > > Aha, good to see the benefit of multiple IndexReaders/Searchers so clearly. > Makes me think we'll want to add a config setting for this in Solr... :) Until then, you might want to use: Runtime.ava