Just to confirm. Your talking about the xqts tests (~19000 queries), correct?
Sent from my iPhone > On Dec 16, 2013, at 9:34 PM, Vinayak Borkar <[email protected]> wrote: > > Preston, > > > Can you try running the W3C XQuery tests against your current codebase with > all the rules and optimizations and compare the outcome with running the > tests on our last release. > > Please report the outcome on this list. Let's ensure that we are not > regressing while adding these optimizations. > > Thanks, > Vinayak > > >> On 12/13/13, 1:12 PM, Eldon Carman wrote: >> I added the rule to take the previously mentioned subplan and make it >> into a single assign for child. The change dropped 4 minutes off each >> child path step that was found in the pattern mentioned. I have attached >> the new query plan and the results of several modified queries to show >> the change in times based on new additions to the query. >> >> Saxon Execution time: 0m36.009s >> VXQuery Execution time: 1m33.632s >> >> >> On Thu, Dec 12, 2013 at 11:51 AM, Eldon Carman <[email protected] >> <mailto:[email protected]>> wrote: >> >> After finishing the rewrite rule to merge the child path steps, I >> ran a few tests. The results of the query's and plans are attached. >> >> First I noted when the following group of operators were added to >> the plan, the time changed by 4 minutes (from 35s to 4m27s). >> >> subplan { >> aggregate [$$19] <- [function-call: >> vxquery:{urn:org.apache.vxquery.operators-ext}sequence, >> Args:[function-call: >> vxquery:{urn:org.apache.vxquery.operators-ext}child, >> Args:[function-call: >> vxquery:{urn:org.apache.vxquery.operators-ext}treat, Args:[%0->$$17, >> {http://www.w3.org/2001/XMLSchema}int >> <http://www.w3.org/2001/XMLSchema%7Dint> QUANT_ONE(bytes[5] = >> [1d000000ee])], {http://www.w3.org/2001/XMLSchema}int >> <http://www.w3.org/2001/XMLSchema%7Dint> QUANT_ONE(bytes[5] = >> [1d0000010b])]]] >> -- AGGREGATE |LOCAL| >> unnest $$17 <- function-call: >> vxquery:{urn:org.apache.vxquery.operators-ext}iterate, Args:[%0->$$15] >> -- UNNEST |LOCAL| >> nested tuple source >> -- NESTED_TUPLE_SOURCE |LOCAL| >> } >> -- SUBPLAN |PARTITIONED| >> >> The above query plan section appears twice in the original query. If >> each takes 4 minutes that would account for most of the time. My >> test with the original query has a time of 9m16.336s. >> >> I suggest a rewrite rule that could change this plan section to a >> single assign. >> >> Does anything in this plan section stand out as being slow? Is it >> just the number of operators? The child path step function is fairly >> fast. >> >> >> On Tue, Dec 3, 2013 at 3:48 PM, Eldon Carman <[email protected] >> <mailto:[email protected]>> wrote: >> >> The first query (q00.xq) was executed 10 times on the 10 >> stations of data. The data contains 6,827 files >> (/dataCollection) with 206,686 sensor readings >> (/dataCollection/data) amounting to ~55 MB. The query was >> executed 10 times to remove the overhead of starting and stoping >> the cluster and node controllers in VXQuery. >> >> (: XQuery Filter Query :) >> (: See historical data for Riverside, CA (ASN00008113) station >> by selecting :) >> (: the weather readings for December 25 over the last 10 years. >> :) >> let $collection := "/tmp/1.0_partition_ghcnd_all_xml/sensors" >> for $r in collection($collection)/dataCollection/data >> let $date := xs:date(fn:substring(xs:string(fn:data($r/date)), >> 0, 11)) >> where $r/station eq "GHCND:ASN00008113" >> and fn:year-from-date($date) >= (2003) >> and fn:month-from-date($date) eq 12 >> and fn:day-from-date($date) eq 25 >> return $r >> >> Saxon processed this query 10 times in 35.936s with an average >> of 3.5936s per query. >> VXQuery processed this query 10 times in 504.715s with an >> average of 50.4715s per query. >> >> I ran the query again with out the date filter options. The >> query returns all data from station GHCND:ASN00008113. >> Saxon processed this query 10 times in 35.953s with an average >> of 3.5953s per query. >> VXQuery processed this query 10 times in 376.325s with an >> average of 37.6325s per query. >> >> The below modified query takes an average of 4.0028s. The query >> basically touches each sensor reading but does nothing. The >> select is much simpler and the plan does not have two subplans >> for paths steps used in the select. >> >> let $collection := "/tmp/1.0_partition_ghcnd_all_xml/sensors/ASN" >> for $r in collection($collection)/dataCollection/data >> where empty($r) >> return $r >> >> The process seems to take a lot of time to prepare data and then >> execute the select for the where clause. >> >> Notes on VXQuery performance: >> ======================== >> The frame size was set to 1 MB. >> The cpu is at 100% to 260% on a 8 core machine. (100% is one >> core is being fully used) >> The disk has sporadic activity. >> The system has one cluster controller and one node controller >> set up from inside the CLI script. >> >> Suggested Options: >> 1. Remove the subplans for path steps going into the select. >> * The subplan iterates over a field created by an unnest >> operator. The unnest operator is guaranteed to produce single >> value items. The subplan is not required when the input is a >> single item that gets iterated over then result aggregated back >> together. The process could be a simple assign for the value >> inside the aggregate (including the rest of the nested plan >> operators minus the unnest). >> 2. Project unused variables out of the tuple during local >> execution. >> * Depends on how the tuples are being passes between >> operators. Right now a lot of information is stored in the tuple >> (XML file, all path steps, etc.). Reducing the size could help >> with coping less information during each new path step. >> >> Questions? >> * Can you track to see which operators are taking the longest? >> * Can you explain the tuple stream and how it interacts with >> each operator? Is there one stream? Does it only grow or change >> size at each operator? >> >> >> On Mon, Dec 2, 2013 at 8:14 PM, Vinayak Borkar >> <[email protected] <mailto:[email protected]>> wrote: >> >> Preston, >> >> Let me suggest a way to track down our performance issues in >> VXQuery. Let's approach our queries one at a time. First, >> let's start with the single collection, scan-based queries >> and reason about their performance in comparison to Saxon. >> As an even smaller goal, can you take your first query and >> report running times on the 250MB of data alongwith Saxon's >> running times? >> >> Thanks, >> Vinayak >> >> >> >> >> On 11/29/13, 12:48 PM, Eldon Carman wrote: >> >> The query plans are so big, I attached a document with >> the queries and >> plans. >> >> >> On Wed, Nov 27, 2013 at 8:53 PM, Vinayak Borkar >> <[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>> >> wrote: >> >> Preston, >> >> For each query, please send the following: >> >> 1. The query >> 2. The translated logical plan >> 3. The optimized physical plan >> >> Thanks, >> Vinayak >> >> >> >> On 11/27/13, 8:16 PM, Eldon Carman wrote: >> >> It appears that our query process is taking >> longer than >> expected. I have >> created a small set of sensors to test our >> benchmark queries. >> The data set >> is about 250 MB and the queries execute in 10 >> to 20 seconds with >> the SAXON >> XSLT processor. When I tried a few of the >> queries on VXQuery, >> the process >> ran for one hour and still did not complete. I >> am now looking >> into where >> the time is being spent for our query and see >> why its taking so >> long. >
