[Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
The proposal to merge lp:~zorba-coders/zorba/dataguide into lp:zorba has been updated. Status: Approved = Rejected For more details, see: https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
Superseded by use-dataguide merge proposal. -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
I've done some additional testing, and these are the results: For the xray query, the largest that we have in the testsuite, compilation time with --compile-only is pretty much the same with and without the dataguide computaiton, at around ~0.08 sec. With a specially constructed query that looks like this: (see dataguide-29.jq test) let $col := dml:collection() let $col2 := ($col.cat1, $col.cat2, ... , $col.cat10) return $col2.category.category.category ... category (repeated ~2000 times) the compilation time goes from ~0.7s without the dataguide to ~10s with the dataguide enabled, so it is significant. But this is a worst-case scenario. The resulting dataguide is an object 2000-levels deep. The compilation can be improved significantly by: - keeping track of the leaves nodes in the dataguide tree - rewriting a bit the dataguide structure to store the trees incrementally instead of cloning them - adding a depth cutoff What do you think? -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
DataGuides serve as dynamic schemas, generated from the database. What we generate is a schema from the query. Still, it is a data schema, not a query schema. The one in the paper would be a Database DataGuide and ours would be Query DataGuide. I would agree to change it to QueryDataguide but I don't think there would be any confusions if it was simply called Dataguide. I think we will run into a problem. 28msec has only one buffer that is accessed by all db:collection() calls in a query. Hence, the information needs to be the union. If there is no way of removing that limitation then we can overcome this by doing an union on all db:collection() dataguides and this will ensure correctness. But it would be a pity to loose the individually computed dataguides for each separate call. Still, if the name of fields of different collections are mostly disjoint sets, then we won't loose much of the improvement. Again I suggest leaving this until I start implementing the push-down of projection info into the db:collection() calls. It has no impact on jn:parse() -- these dataguides can still be computed and kept individually for each call even if we do an union on db:collection() calls. -- -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
- I find the name dataguide misleading because it's a guide on the query and not on the data. Maybe QueryPruneGuide would be more meaningful The query itself is not pruned, the data is. I think dataguide is the established term -- see for example this paper: http://ilpubs.stanford.edu:8090/264/1/1997-50.pdf . - Can the user also use the zann_explores_json annotation? Yes, the users can use it as well. But does it make sense for them to use it? If they have an external function -- it is automatically handled as if it has the annotation. For a UDF it doesn't really make any sense to add it. - Why is the dataguide parameter on the Store's getCollection() function? Shouldn't it be on the function that returns the iterator? The problem is that a Collection object within the simplestore exists only once per collection. What's the semantics if multiple queries access the collection (possibly in parallel)? It very much depends on how the collections are handled. Currently for Zorba collections it doesn't make sense to have any dataguides at all, because they're in-memory collections. I have not taken a look at the Sausalito code and have not seen how e.g. the MongoDB collections are managed. getCollection() seemed the most logical place where it should be passed, but the dataguide parameter could be easily propagated to any Store class, including the function that returns the iterator. Currently each and every db:collection() call has its own dataguide, even if they might refer to the same collection. If the collection manager currently caches or reuses the collection iterators, then it might make sense to forbid that so that the dataguide for each individual db:collection call could be used. Or alternatively, an union on the dataguides that refer to the same collection could be performed. But I think it is not always possible to determine if that is the case. I think this could be investigated and decided upon when implementing the Dataguide push-down into MongoDB or when I would take a better look at the Sausalito's collection manager code. - Did you measure the performance impact of the optimizer on some larger queries? The expression tree is traversed in its entirety once and only once, visiting each node, so the performance should not be very different from any other dataflow computation, e.g. ignores sorts/order/etc. If there are no sources, i.e. db:collection() or jn:parse() calls, then the dataguide computation just propagates NULLs, doing no calculations and almost no memory allocations (at most one dataguide_cb allocation per fo_exprs and several others). If there are sources in the tree -- there will be some union operations being performed for some of the nodes. I will check if any of our larger queries have longer compilation times, but because none of them have db:collection() or jn:parse() calls, I do not expect any differences. It would make sense to have a specially constructed query that would do a stress-test of the dataguide code -- e.g. a db:collection().navigation.navigation. ... .navigation several thousand times or something similar. I will try that out and see if it manages to slow down the compilation. -- -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
- I find the name dataguide misleading because it's a guide on the query and not on the data. Maybe QueryPruneGuide would be more meaningful The query itself is not pruned, the data is. I think dataguide is the established term -- see for example this paper: http://ilpubs.stanford.edu:8090/264/1/1997-50.pdf . DataGuides serve as dynamic schemas, generated from the database. What we generate is a schema from the query. - Why is the dataguide parameter on the Store's getCollection() function? Shouldn't it be on the function that returns the iterator? The problem is that a Collection object within the simplestore exists only once per collection. What's the semantics if multiple queries access the collection (possibly in parallel)? It very much depends on how the collections are handled. Currently for Zorba collections it doesn't make sense to have any dataguides at all, because they're in-memory collections. I have not taken a look at the Sausalito code and have not seen how e.g. the MongoDB collections are managed. getCollection() seemed the most logical place where it should be passed, but the dataguide parameter could be easily propagated to any Store class, including the function that returns the iterator. Currently each and every db:collection() call has its own dataguide, even if they might refer to the same collection. If the collection manager currently caches or reuses the collection iterators, then it might make sense to forbid that so that the dataguide for each individual db:collection call could be used. Or alternatively, an union on the dataguides that refer to the same collection could be performed. But I think it is not always possible to determine if that is the case. I think this could be investigated and decided upon when implementing the Dataguide push-down into MongoDB or when I would take a better look at the Sausalito's collection manager code. I think we will run into a problem. 28msec has only one buffer that is accessed by all db:collection() calls in a query. Hence, the information needs to be the union. -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
Review: Approve -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
Re: [Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
Review: Needs Fixing - I find the name dataguide misleading because it's a guide on the query and not on the data. Maybe QueryPruneGuide would be more meaningful - Can the user also use the zann_explores_json annotation? - Why is the dataguide parameter on the Store's getCollection() function? Shouldn't it be on the function that returns the iterator? The problem is that a Collection object within the simplestore exists only once per collection. What's the semantics if multiple queries access the collection (possibly in parallel)? - Did you measure the performance impact of the optimizer on some larger queries? -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
Validation queue starting for the following merge proposals: https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Progress dashboard at http://jenkins.lambda.nu/view/ValidationQueue -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
Voting criteria failed for the following merge proposals: https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 : Votes: {'Pending': 1} -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
Validation queue result for https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Stage CommitZorba failed. Check console output at http://jenkins.lambda.nu/job/CommitZorba/9/console to view the results. -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
Validation queue starting for the following merge proposals: https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Progress dashboard at http://jenkins.lambda.nu/view/ValidationQueue -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
The proposal to merge lp:~zorba-coders/zorba/dataguide into lp:zorba has been updated. Status: Needs review = Approved For more details, see: https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
Validation queue starting for the following merge proposals: https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Progress dashboard at http://jenkins.lambda.nu/view/ValidationQueue -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
Validation queue result for https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Stage TestZorbaUbuntu failed. 794 tests failed (8369 total tests run). Check test results at http://jenkins.lambda.nu/job/TestZorbaUbuntu/49/testReport/ to view the results. -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
Validation queue starting for the following merge proposals: https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Progress dashboard at http://jenkins.lambda.nu/view/ValidationQueue -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
Validation queue result for https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Stage TestZorbaUbuntu failed. 794 tests failed (8369 total tests run). Check test results at http://jenkins.lambda.nu/job/TestZorbaUbuntu/50/testReport/ to view the results. -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
Validation queue starting for the following merge proposals: https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Progress dashboard at http://jenkins.lambda.nu/view/ValidationQueue -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
Validation queue starting for the following merge proposals: https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Progress dashboard at http://jenkins.lambda.nu/view/ValidationQueue -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp
[Zorba-coders] [Merge] lp:~zorba-coders/zorba/dataguide into lp:zorba
Validation queue result for https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Stage TestZorbaUbuntu failed. 26 tests failed (8373 total tests run). Check test results at http://jenkins.lambda.nu/job/TestZorbaUbuntu/52/testReport/ to view the results. -- https://code.launchpad.net/~zorba-coders/zorba/dataguide/+merge/173026 Your team Zorba Coders is subscribed to branch lp:zorba. -- Mailing list: https://launchpad.net/~zorba-coders Post to : zorba-coders@lists.launchpad.net Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp