RE: Keyword fields, Porter stemming, and QueryParser

2006-01-25 Thread Dmitry Goldenberg
Dave, Thanks for the pointer. The Wrapper worked marvellously! This was exactly the situation - wanting to treat the standard fields and keyword fields differently as far as stemming is concerned (no stemming for the latter). - Dmitry From: Dave Kor [mailt

Re: how to select top categories.

2006-01-25 Thread Chris Hostetter
: for this site, but would you cash all manufacturers and intersect all with : the initial query in one page load? Seems like that would be alot. Yep it is a lot, but if you've got the RAM, it's not that time intensive. At CNET, depending on what page you are looking at, i'm doing anywhere from 1

Re: how to select top categories.

2006-01-25 Thread Mike Austin
Chris.. thanks for you quick response. :doing a few thousand BitSet intersections doesn't take as much time as you think Even if the BitSet is around 4-5 million? and I would have to quickly go through about a thousand of these? I guess I would have to decide what sub-cats to cache the bitsets fo

Re: Throughput doesn't increase when using more concurrent threads

2006-01-25 Thread Peter Keegan
Yes, it's hyperthreaded (16 cpus show up in task manager - the box is running 2003). I plan to turn off hyperthreading to see if it has any effect. Peter On 1/25/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > On 1/25/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > > It's a 3GHz Intel box with Xeo

Re: how to select top categories.

2006-01-25 Thread Chris Hostetter
You will likely find this thread interesting... http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-Product-Category-Listings-t266441.html : 1) Do queries for each sub-category using the results of the first initial : query and use the hits count to select the sub-categories to displa

Re: Throughput doesn't increase when using more concurrent threads

2006-01-25 Thread Yonik Seeley
On 1/25/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > It's a 3GHz Intel box with Xeon processors, 64GB ram :) Nice! Xeon processors are normally hyperthreaded. On a linux box, if you cat /proc/cpuinfo, you will see 8 processors for a 4 physical CPU system. Are you positive you have 8 physical X

Re: Throughput doesn't increase when using more concurrent threads

2006-01-25 Thread Peter Keegan
The index is non-compound format and optimized. Yes, I did try MMapDirectory, but the index is too big - 3.5 GB (1.3GB is term vectors) Peter On 1/25/06, Doug Cutting <[EMAIL PROTECTED]> wrote: > > Peter Keegan wrote: > > This is just fyi - in my stress tests on a 8-cpu box (that's 8 real > cpus

Re: Help needed with BooleanQuery formation

2006-01-25 Thread Chris Hostetter
: I want a query of the form: : : x AND ( a OR b OR c OR d) what your code is currenlty doing is adding 5 term queries to a single boolean query. The structure you want is not a single boolean query, it's a boolean query containing two mandatory clauses: the first being a term query, and the sec

Re: Throughput doesn't increase when using more concurrent threads

2006-01-25 Thread Peter Keegan
It's a 3GHz Intel box with Xeon processors, 64GB ram :) Peter On 1/25/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > Thanks Peter, that's useful info. > > Just out of curiosity, what kind of box is this? what CPUs? > > -Yonik > > On 1/25/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > > This is

Re: performance implications for an index with large number of documents.

2006-01-25 Thread Otis Gospodnetic
Hi, Quick reactions: - Do use -server option, it makes a difference, and I don't think there is much to test there (I've never run a daemon-like service without the -server option, and have seen the improvement in performance due to HotSpot with my own eyes) - Optimizing every hour sounds like a

Re: Throughput doesn't increase when using more concurrent threads

2006-01-25 Thread Doug Cutting
Peter Keegan wrote: This is just fyi - in my stress tests on a 8-cpu box (that's 8 real cpus), the maximum throughput occurred with just 4 query threads. The query throughput decreased with fewer than 4 or greater than 4 query threads. The entire index was most likely in the file system cache, t

Re: Throughput doesn't increase when using more concurrent threads

2006-01-25 Thread Yonik Seeley
Thanks Peter, that's useful info. Just out of curiosity, what kind of box is this? what CPUs? -Yonik On 1/25/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > This is just fyi - in my stress tests on a 8-cpu box (that's 8 real cpus), > the maximum throughput occurred with just 4 query threads. The

how to select top categories.

2006-01-25 Thread Mike Austin
I have ~5 million documents that are in categories and subcategories. Let us say that my query is for search terms in one top-level category and it returns a large amount of documents and I want to list the top 5 sub-categories by highest count total. I know I can't go one by one counting through t

Re: Throughput doesn't increase when using more concurrent threads

2006-01-25 Thread Peter Keegan
This is just fyi - in my stress tests on a 8-cpu box (that's 8 real cpus), the maximum throughput occurred with just 4 query threads. The query throughput decreased with fewer than 4 or greater than 4 query threads. The entire index was most likely in the file system cache, too. Periodic snapshots

Re: Highlighter

2006-01-25 Thread Erik Hatcher
On Jan 25, 2006, at 6:39 AM, Gwyn Carwardine wrote: Yes I think you're right. On reading the "lucene in action" chapted on highlighting I found it squirreled in the middle of the text. I get the feeling that whilst I have so far found query parser to be the primary method of building queri

MoreDisLikeThis query

2006-01-25 Thread Hemant Joshi
Hi, I have looked at MoreLikeThis functionality. I would like to add moreDisLikeThis functionality as well. It is important for me to learn from similarity as well as dissimilarity with other documents. I have done the basic ground work of forming two queries (one with MoreLikeThis c

Re: Help needed with BooleanQuery formation

2006-01-25 Thread Michael D. Curtin
Michael Pickard wrote: Can anyone help me with the formation of a BooleanQuery ? I want a query of the form: x AND ( a OR b OR c OR d) You're going to need 2 BooleanQuery objects, one for the OR'd expression in parentheses, and another for the AND and expression. Something like this:

Help needed with BooleanQuery formation

2006-01-25 Thread Michael Pickard
Can anyone help me with the formation of a BooleanQuery ? I want a query of the form: x AND ( a OR b OR c OR d) The nearest I've managed to get is query.add(new TermQuery(new Term(2, "x")),true,false); Term term = null; for (int i=1; i

RE: Range queries

2006-01-25 Thread Mike Streeton
Sorry forgot to mention what you do for floats is take everything to the left of decimal point encode this to 16 digit hex (via long) then append of decimal point and everything following it. The only problem we tend to find is searching across large ranges either produces an exception about too ma

RE: Highlighter

2006-01-25 Thread Gwyn Carwardine
>> Yes I think you're right. On reading the "lucene in action" chapted on >> highlighting I found it squirreled in the middle of the text. I get >> the >> feeling that whilst I have so far found query parser to be the primary >> method of building queries that this is not ht eprimary method used

Re: Highlighter

2006-01-25 Thread Erik Hatcher
On Jan 24, 2006, at 5:43 PM, Gwyn Carwardine wrote: Yes I think you're right. On reading the "lucene in action" chapted on highlighting I found it squirreled in the middle of the text. I get the feeling that whilst I have so far found query parser to be the primary method of building queries

RE: Range queries

2006-01-25 Thread Mike Streeton
I can recommend this method, this is how we do it, but what we store in the index is the long converted to a 16 digit number hex. The extended parser converts entered queries containing longs field to have hex. We obviously also do the conversion before we display the value. Floating point numbers

Re: Highlighter

2006-01-25 Thread Erik Hatcher
On Jan 25, 2006, at 12:50 AM, Ravi wrote: I am also have some problem with highlighter when I want to highlight specific field in the lucene it is not working Improvements were made to the Highlighter in December to add field- specific highlighting capability. Here's the svn log: -