improving the scalability in searching part 2

2007-08-08 Thread Ard Schrijvers
Problem 2: 2) The XPath jcr:like implementation, for example : //*[jcr:like(@mytext,'%foo bar qu%')] The jcr:like implementation (for sql holds the same) is translated to a JackRabbit WildcardQuery which in turn uses a WildcardTermEnum which has a "protected boolean termCompare(Term term)" met

Re: improving the scalability in searching part 2

2007-08-13 Thread Bertrand Delacretaz
On 8/8/07, Ard Schrijvers <[EMAIL PROTECTED]> wrote: > ...2) The XPath jcr:like implementation, for example : > //*[jcr:like(@mytext,'%foo bar qu%')] > ...the current jcr:like results in queries taking up to 10 seconds to > complete for only > 1000 nodes with one property, "mytext" which is on av

RE: improving the scalability in searching part 2

2007-08-14 Thread Ard Schrijvers
> On 8/8/07, Ard Schrijvers <[EMAIL PROTECTED]> wrote: > > ...2) The XPath jcr:like implementation, for example : > //*[jcr:like(@mytext,'%foo bar qu%')] > > ...the current jcr:like results in queries taking up to 10 > seconds to complete for only > > 1000 nodes with one property, "mytext" which

Re: improving the scalability in searching part 2

2007-08-14 Thread Bertrand Delacretaz
On 8/14/07, Ard Schrijvers <[EMAIL PROTECTED]> wrote: > ...For a leading wildcard, I think some sort of 2 step filter might work, > where the first term > is expanded to all possible terms that end with that term, then seek for > documents in > the full text that match, and then do the current f

RE: improving the scalability in searching part 2

2007-08-14 Thread Ard Schrijvers
> On 8/14/07, Ard Schrijvers <[EMAIL PROTECTED]> wrote: > > > ...For a leading wildcard, I think some sort of 2 step > filter might work, where the first term > > is expanded to all possible terms that end with that term, > then seek for documents in > > the full text that match, and then do

Re: improving the scalability in searching part 2

2007-08-15 Thread Bertrand Delacretaz
(resend, apparently GMail lost track of the jackrabbit.apache.org DNS name for a while) On 8/14/07, Ard Schrijvers <[EMAIL PROTECTED]> wrote: > ...Yes I did [1]. I do not think the suggested MultiPhraseQuery is the most > performant. I > think the lucene ChainedFilter would apply better, though

Re: improving the scalability in searching part 2

2007-08-15 Thread Marcel Reutegger
Bertrand Delacretaz wrote: One thing that comes to mind would be to write messages (warnings to a specific "performance log" channel?) to the log when such inefficient queries are used, to help people find out why their queries are slow. Sounds like a good idea. I created a jira issue: http://i

RE: improving the scalability in searching part 2

2007-08-15 Thread Ard Schrijvers
> Bertrand Delacretaz wrote: > > One thing that comes to mind would be to write messages > (warnings to a > > specific "performance log" channel?) to the log when such > inefficient > > queries are used, to help people find out why their queries > are slow. > > Marcel Reutegger wrote: > Sounds