Hi Christian, can you enable debug logs on org.apache.jackrabbit.core.query.lucene.join.QueryEngine? I'm curious to see what the constraits look like in the big query vs the 2 small ones.
This also goes for the join you've mentioned later in the thread, but I just wanted to start with the first query ;) alex On Tue, Mar 27, 2012 at 9:55 AM, Christian Stocker < [email protected]> wrote: > Hi > > On 27.03.12 09:49, David Buchmann wrote: > > sorry, my bad. did not read correctly. > > you do have the paranthesis so you did what you wanted to do. > > > > looks like lucene/jackrabbit combine the 2 datasets first and filter > > later... > > > > what if you try > > > > > > SELECT * FROM [own:unstructured] AS data > > WHERE > > data.guid = 'J7B1X' AND ISDESCENDANTNODE(data, '/article') > > OR > > data.guid = 'J7B1X' AND ISDESCENDANTNODE(data, '/import/article') > > ORDER BY firstImportDate DESC > > I tried that and I tried it again now. Same response time as the > original query. > > Any hints from someone who knows the internal workings of > jackrabbit/lucene? > > chregu > > > > > if this is fast, then the jackrabbit query engine is not very clever... > > > > cheers,david > > > > > > Am 27.03.2012 09:10, schrieb David Buchmann: > >> i think the 2 queries are not equivalent. the first one is equivalent to > > > >> ... > >> WHERE data.guid = 'J7B1X' > >> AND (ISDESCENDANTNODE(data, '/article') > > > >> plus > > > >> WHERE > >> ISDESCENDANTNODE(data, '/import/article') > > > >> (if you want the data.guid = ... to apply to both, you need paranthesis) > > > >> but if /import/article is almost empty, i still don't see why the > >> combined query should take so long unless jackrabbit/lucene are doing > >> something stupid. > > > >> cheers,david > > > >> Am 26.03.2012 22:28, schrieb Christian Stocker: > >>> Hi > > > >>> We have the following search query > > > > > >>> SELECT * FROM [own:unstructured] AS data WHERE data.guid = 'J7B1X' > >>> AND (ISDESCENDANTNODE(data, '/article') > >>> OR ISDESCENDANTNODE(data, '/import/article') > >>> ) > >>> ORDER BY firstImportDate DESC > > > > > >>> This query can take quite some time (up to 3 seconds, but it gets more > >>> and more hte more data we have). In /article there's potentially a lot > >>> of nodes, in /import/article usually almost nil. > > > > > >>> If we now separate the query into 2: > > > >>> SELECT * FROM [own:unstructured] AS data WHERE data.guid = 'J7B1X' > >>> AND ISDESCENDANTNODE(data, '/article') > >>> ORDER BY firstImportDate DESC > > > >>> and > > > >>> SELECT * FROM [own:unstructured] AS data WHERE data.guid = 'J7B1X' > >>> AND ISDESCENDANTNODE(data, '/import/article') > >>> ORDER BY firstImportDate DESC > > > >>> Both queries take approx. 10ms (and return 0 or 1 resultset, more is > not > >>> possible). So quite fast. > > > >>> Can anyone explain to me, why that is and how we could rewrite the > query > >>> to make it fast with a single one as well? > > > >>> Thanks > > > >>> chregu > > > > >
