Re: [DISCUSS] - QueryIndex selection

2014-06-28 Thread Michael Marth
Hi, I looked a bit into how MongoDB selects indexes (query plans) and think we could take some inspiration. So, the way MongoDB does it afaiu: * query gets parsed into Abstract Syntax Tree (so that parameters can get stripped out) * the first time this query is performed then the query is

Re: [DISCUSS] - QueryIndex selection

2014-06-26 Thread Angela Schreiber
hi jukka this is not quite true. as i will explain below. first i would strongly recommend not to rely on the current implementation. if we have the requirement to evaluated permissions based on the path we may extend the permissionprovider which IMO is the key API for these cases; not the

Re: [DISCUSS] - QueryIndex selection

2014-06-26 Thread Thomas Mueller
Hi, Can't we do the ACL check lazily? That's what we do right now. Regards, Thomas

Re: [DISCUSS] - QueryIndex selection

2014-06-26 Thread Jukka Zitting
Hi, On Thu, Jun 26, 2014 at 4:10 AM, Angela Schreiber anch...@adobe.com wrote: however, please be aware that one key feature of oak (compared to jackrabbit which only allowed permission evaluation by path) is that it always needs to be clear if the target for the permission evaluation is a

Re: [DISCUSS] - QueryIndex selection

2014-06-26 Thread Jukka Zitting
Hi, On Thu, Jun 26, 2014 at 2:55 AM, Davide Giannella dav...@apache.org wrote: Can't we do the ACL check lazily? Instead of the query engine looping through the nodes and check, if there's no need of doing so already (IE sorting), why not returning the set and then filter out the ACLs while

Re: [DISCUSS] - QueryIndex selection

2014-06-25 Thread Jukka Zitting
Hi, On Mon, Jun 23, 2014 at 4:23 PM, Thomas Mueller muel...@adobe.com wrote: Sorry, sure, the condition is verified again. But this might be an in-memory operation. The index may return the property value for each entry as part of running the query (QueryIndex - Cursor - IndexRow). I think

Re: [DISCUSS] - QueryIndex selection

2014-06-25 Thread Thomas Mueller
Hi, But getting to that point may be a bit tricky, especially because of access control. Yes, we would need to use a different access control API. The ability to check whether a session has access to a path/node/property, without actually loading the node from the storage backend. Maybe that API

Re: [DISCUSS] - QueryIndex selection

2014-06-25 Thread Jukka Zitting
Hi, On Wed, Jun 25, 2014 at 10:16 AM, Thomas Mueller muel...@adobe.com wrote: Yes, we would need to use a different access control API. The ability to check whether a session has access to a path/node/property, without actually loading the node from the storage backend. Maybe that API is

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Thomas Mueller
Hi, should we just return the number of estimated entries for the cost? For Lucene, the property index, the ordered index, and the node type index: yes. For Solr, the cost per index lookup (not per entry) is probably a bit higher, because there is a network round trip. Specially if Solr is

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Jukka Zitting
Hi, On Mon, Jun 23, 2014 at 3:30 AM, Thomas Mueller muel...@adobe.com wrote: Right. I don't believe the cost of the index lookup is significant (at least in the asymptotic sense) compared to the overall cost of executing a query. Sorry, I don't understand. The cost of the index lookup *is*

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Thomas Mueller
Hi, The problem with that assumption is that typically a single disk read to the index would return n paths, whereas loading those n nodes might well take n more disk reads. Ideally, the cost returned of the index would reflect that. For single-property indexes (all property indexes are single

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Jukka Zitting
Hi, On Mon, Jun 23, 2014 at 11:18 AM, Thomas Mueller muel...@adobe.com wrote: Sure, but we don't use a covered index. Yes, we are not there yet. The node is currently loaded to check access rights, but that's an implementation detail of access control part. And it's not needed for the admin.

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Thomas Mueller
Hi, It's more than access control. The query engine needs to double-check the constraints of the query for each matching path before passing that node to the client (see the constraint.evaluate() call in [1]). I don't see any easy way to avoid that step without major refactoring. If there is no

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Jukka Zitting
Hi, On Mon, Jun 23, 2014 at 1:58 PM, Thomas Mueller muel...@adobe.com wrote: It's more than access control. The query engine needs to double-check the constraints of the query for each matching path before passing that node to the client (see the constraint.evaluate() call in [1]). I don't see

Re: [DISCUSS] - QueryIndex selection

2014-06-23 Thread Thomas Mueller
Hi, It's more than access control. The query engine needs to double-check the constraints of the query for each matching path before passing that node to the client (see the constraint.evaluate() call in [1]). I don't see any easy way to avoid that step without major refactoring. If there is no

Re: [DISCUSS] - QueryIndex selection

2014-06-18 Thread Tommaso Teofili
2014-06-04 9:36 GMT+02:00 Thomas Mueller muel...@adobe.com: Hi, QueryIndex.getCost: this is actually quite well documented (see the Javadocs). But the implementations might not fully follow the contract :-) this is probably just my opinion but the contract is not much clear; to me finding

Re: [DISCUSS] - QueryIndex selection

2014-06-18 Thread Davide Giannella
On 18/06/2014 10:26, Tommaso Teofili wrote: it would be ok for me to either deprecate it or improve the semantics of the cost calculation (e.g. explicitly introduce other metrics to be taken into account in the cost calculation: local / remote index, With the IndexPlan.isDelayed() we instruct

Re: [DISCUSS] - QueryIndex selection

2014-06-18 Thread Thomas Mueller
Hi, QueryIndex.getCost my doubt is what this heuristic function to estimate the traversed entries should look like in general Relational databases typically know the number of entries in the index (total indexed entries), plus the selectivity of a column. See also

Re: [DISCUSS] - QueryIndex selection

2014-06-18 Thread Jukka Zitting
Hi, On Wed, Jun 18, 2014 at 4:26 AM, Tommaso Teofili tommaso.teof...@gmail.com wrote: should we just return the number of estimated entries for the cost? Yes, that's what I think the contract should be. My other concern on this point is that it's not granted, in my opinion, that the index

Re: [DISCUSS] - QueryIndex selection

2014-06-18 Thread Jukka Zitting
Hi, On Wed, Jun 18, 2014 at 7:44 AM, Thomas Mueller muel...@adobe.com wrote: My other concern on this point is that it's not granted, in my opinion, that the index returning less entries would be the faster. Yes, it's not that much about less entries or more entries, it's about lower or higher

Re: [DISCUSS] - QueryIndex selection

2014-06-18 Thread Tommaso Teofili
ok, thanks Davide for the pointers. Regards, Tommaso 2014-06-18 13:36 GMT+02:00 Davide Giannella giannella.dav...@gmail.com: On 18/06/2014 10:26, Tommaso Teofili wrote: it would be ok for me to either deprecate it or improve the semantics of the cost calculation (e.g. explicitly introduce

Re: [DISCUSS] - QueryIndex selection

2014-06-18 Thread Tommaso Teofili
Hi, 2014-06-18 13:44 GMT+02:00 Thomas Mueller muel...@adobe.com: Hi, QueryIndex.getCost my doubt is what this heuristic function to estimate the traversed entries should look like in general Relational databases typically know the number of entries in the index (total indexed entries),

Re: [DISCUSS] - QueryIndex selection

2014-06-18 Thread Tommaso Teofili
Hi, 2014-06-18 16:02 GMT+02:00 Jukka Zitting jukka.zitt...@gmail.com: Hi, On Wed, Jun 18, 2014 at 4:26 AM, Tommaso Teofili tommaso.teof...@gmail.com wrote: should we just return the number of estimated entries for the cost? Yes, that's what I think the contract should be. ok, that's

Re: [DISCUSS] - QueryIndex selection

2014-06-18 Thread Jukka Zitting
Hi, On Wed, Jun 18, 2014 at 11:31 AM, Tommaso Teofili tommaso.teof...@gmail.com wrote: 2014-06-18 16:02 GMT+02:00 Jukka Zitting jukka.zitt...@gmail.com: On Wed, Jun 18, 2014 at 4:26 AM, Tommaso Teofili tommaso.teof...@gmail.com wrote: should we just return the number of estimated entries for

Re: [DISCUSS] - QueryIndex selection

2014-06-04 Thread Thomas Mueller
We could let the user decide if using an asynchronous index is OK or not. Another option is if there is no synch index available but an asynch index is available then QueryEngine should use that instead of resorting to traversal. Well, this is the current behavior. The query engine doesn't