Re: Querying vs iterating

Jason E Bailey Mon, 20 Jun 2016 07:48:34 -0700

I have seen significant gains in obtaining a list of results, and the
speed of my services, by doing an iteration versus a query. I have had a
query looking for an indexed node type, going from 10 minutes to 1 and a
half minute.


I should point out that  that makes no sense.

When it was first suggested to me that I iterate rather than use a
query. I looked at the person in question as if they had never studied
computers. It has historically been beaten into my head for close to 2
decades that if you want performance from a data store, you use a query
and you create indexes.  Doing an iteration struck me as something that
only someone who didn't know what they were doing would suggest or that
they had done something wrong in their setup i.e. failed to set up an
index.

I was wrong.

Maybe I should file a bug report. "Hey why is it that I can iterate and
get results faster than doing an indexed query?" I'm also sure that
indexing and running a query is the right way to address some needs.
However, right now, every time I've done a comparison between executing
a query and just going to the source and checking myself.  The iteration
style has been significantly faster.

--
Jason

On Mon, Jun 20, 2016, at 10:01 AM, Julian Sedding wrote:
> Hi Roy
> 
> Yes, I would expect that you cannot measure any meaningful difference.
> Using a query may be marginally faster, because it can traverse using
> internal Oak APIs. On the other hand it may be slightly slower,
> because of possible QueryEngine overhead.
> 
> Personally I would test whether it works sufficiently well with a
> query, because it is less code.
> 
> Note also that Sling Query
> (https://sling.apache.org/documentation/bundles/sling-query.html)
> allows you to express a query and choose traversal vs query as a
> strategy. This may or may not help.
> 
> Regards
> Julian
> 
> 
> On Mon, Jun 20, 2016 at 3:52 PM, Roy Teeuwen <r...@teeuwen.be> wrote:
> > Hey Julian,
> >
> > Ok cool, for me the context is querying on a page in AEM, so I am creating 
> > a query for one cq:Page node, so that will be most of the times max like 
> > 10-20 nodes.
> > So what you are saying then is that it shouldn’t really matter in 
> > performance to choose either for manually traverse myself or doing a query 
> > when looking to see if a specific property name exists on the page,
> > because behind the scene it will most likely traverse itself then anyway, 
> > right?
> >
> > Thanks!
> > Roy
> >> On 20 Jun 2016, at 15:43, Julian Sedding <jsedd...@gmail.com> wrote:
> >>
> >> Hi Roy
> >>
> >> From you question ("hard to put an index to it") I assume that you are
> >> running on an Oak repository. If that is incorrect, my answer does not
> >> apply.
> >>
> >> Oak will always consider traversal as an alternative to existing
> >> indexes. For most queries the cost of traversal is so high that an
> >> index is chosen. However, if no suitable index exists (and
> >> theoretically also if the traversal is cheaper than a lookup in a
> >> matching index), it will do a traversal behind the scenes. Note that
> >> traversal logs a warning every 10000 traversed nodes. So if you plan
> >> to traverse more than that you should really consider creating an
> >> index.
> >>
> >> In short: with Oak using a query on a small subtree should give you
> >> what you want, even without an index.
> >>
> >> Regards
> >> Julian
> >>
> >>
> >> On Thu, Jun 16, 2016 at 4:44 PM, Steven Walters <kemu...@gmail.com> wrote:
> >>> Hopefully other people chime in here, I've only had bad experiences
> >>> with utilizing queries and have often resulted in personally never
> >>> using them - so I always end up iterating/navigating myself.
> >>>
> >>> Theoretically if you have a REALLY GOOD index then you may get some
> >>> similar performances, but if your index(es) are inefficient, then it's
> >>> just wasted CPU cycles (you'd wish those CPU cycles were going to a
> >>> good cause, but they're not).
> >>>
> >>> the transition of Sling (and AEM) to Oak from Jackrabbit 2.x made this
> >>> experience worse with the awkward indexing policies/process in Oak,
> >>> and the fact that Oak never seemed to ever use multiple indexes.
> >>> Oak always seemed to calculates the costs of the entire query against
> >>> all the available indexes and only chooses the ONE best index.
> >>> This sounds like a good idea in theory, but then most DBMS I've used
> >>> in the past utilize ALL the indexes they can - not just one.
> >>>
> >>> So basically i guess this comes to be "If you have a good index (in
> >>> that it can apply to ALL the conditions/attributes/properties of your
> >>> query) then using a query should be fine, otherwise iterate yourself"
> >>> having any condition missing from the index can be fatal in
> >>> performance, such as lacking the evaluatePathRestrictions = true,
> >>> which without it is basically death of the system if you have a lot of
> >>> content.
> >>>
> >>> But really, I hope some other people with more positive experiences
> >>> can provide some better advice.
> >>>
> >>> On Thu, Jun 16, 2016 at 11:08 PM, Roy Teeuwen <r...@teeuwen.be> wrote:
> >>>> Ok, it would be handy to have an estimate on the approximate amount / 
> >>>> levels of resources when to go for iterating vs querying :).
> >>>>
> >>>> Greets
> >>>> Roy
> >>>>> On 16 Jun 2016, at 16:06, Steven Walters <kemu...@gmail.com> wrote:
> >>>>>
> >>>>> if you know there are that few resources, then I say iterating would be
> >>>>> better performing than XPath / JCR-SQL2 queries.
> >>>>> This is primarily from past experience speaking in that queries have
> >>>>> generally turned out (often MUCH) slower than directly iterating if you
> >>>>> know what you're actually looking for.
> >>>>>
> >>>>>
> >>>>> On Thu, Jun 16, 2016 at 10:28 PM, Roy Teeuwen <r...@teeuwen.be> wrote:
> >>>>>
> >>>>>> Hello all,
> >>>>>>
> >>>>>> Lets say I got a resource with around 10-20 child/grand-child 
> >>>>>> resources,
> >>>>>> not going deeper than 3 levels max. What is the most performant when
> >>>>>> searching for the child resources containing a specific property (the
> >>>>>> property is configurable with OSGi, so hard to put an index on it).
> >>>>>> Iterating the child / grand-child resources until you find it or 
> >>>>>> making an
> >>>>>> xpath/jcr-sql2 query? When would one option start to be more performant
> >>>>>> than the other.
> >>>>>>
> >>>>>> Thanks!
> >>>>>> Roy
> >>>>
> >

Re: Querying vs iterating

Reply via email to