Just to confirm. Your talking about the xqts tests (~19000 queries), correct?

Sent from my iPhone

> On Dec 16, 2013, at 9:34 PM, Vinayak Borkar <[email protected]> wrote:
> 
> Preston,
> 
> 
> Can you try running the W3C XQuery tests against your current codebase with 
> all the rules and optimizations and compare the outcome with running the 
> tests on our last release.
> 
> Please report the outcome on this list. Let's ensure that we are not 
> regressing while adding these optimizations.
> 
> Thanks,
> Vinayak
> 
> 
>> On 12/13/13, 1:12 PM, Eldon Carman wrote:
>> I added the rule to take the previously mentioned subplan and make it
>> into a single assign for child. The change dropped 4 minutes off each
>> child path step that was found in the pattern mentioned. I have attached
>> the new query plan and the results of several modified queries to show
>> the change in times based on new additions to the query.
>> 
>> Saxon Execution time: 0m36.009s
>> VXQuery Execution time: 1m33.632s
>> 
>> 
>> On Thu, Dec 12, 2013 at 11:51 AM, Eldon Carman <[email protected]
>> <mailto:[email protected]>> wrote:
>> 
>>    After finishing the rewrite rule to merge the child path steps, I
>>    ran a few tests. The results of the query's and plans are attached.
>> 
>>    First I noted when the following group of operators were added to
>>    the plan, the time changed by 4 minutes (from 35s to 4m27s).
>> 
>>               subplan {
>>                         aggregate [$$19] <- [function-call:
>>    vxquery:{urn:org.apache.vxquery.operators-ext}sequence,
>>    Args:[function-call:
>>    vxquery:{urn:org.apache.vxquery.operators-ext}child,
>>    Args:[function-call:
>>    vxquery:{urn:org.apache.vxquery.operators-ext}treat, Args:[%0->$$17,
>>    {http://www.w3.org/2001/XMLSchema}int
>>    <http://www.w3.org/2001/XMLSchema%7Dint> QUANT_ONE(bytes[5] =
>>    [1d000000ee])], {http://www.w3.org/2001/XMLSchema}int
>>    <http://www.w3.org/2001/XMLSchema%7Dint> QUANT_ONE(bytes[5] =
>>    [1d0000010b])]]]
>>                         -- AGGREGATE  |LOCAL|
>>                           unnest $$17 <- function-call:
>>    vxquery:{urn:org.apache.vxquery.operators-ext}iterate, Args:[%0->$$15]
>>                           -- UNNEST  |LOCAL|
>>                             nested tuple source
>>                             -- NESTED_TUPLE_SOURCE  |LOCAL|
>>                      }
>>               -- SUBPLAN  |PARTITIONED|
>> 
>>    The above query plan section appears twice in the original query. If
>>    each takes 4 minutes that would account for most of the time. My
>>    test with the original query has a time of 9m16.336s.
>> 
>>    I suggest a rewrite rule that could change this plan section to a
>>    single assign.
>> 
>>    Does anything in this plan section stand out as being slow? Is it
>>    just the number of operators? The child path step function is fairly
>>    fast.
>> 
>> 
>>    On Tue, Dec 3, 2013 at 3:48 PM, Eldon Carman <[email protected]
>>    <mailto:[email protected]>> wrote:
>> 
>>        The first query (q00.xq) was executed 10 times on the 10
>>        stations of data. The data contains 6,827 files
>>        (/dataCollection) with 206,686 sensor readings
>>        (/dataCollection/data) amounting to ~55 MB. The query was
>>        executed 10 times to remove the overhead of starting and stoping
>>        the cluster and node controllers in VXQuery.
>> 
>>        (: XQuery Filter Query :)
>>        (: See historical data for Riverside, CA (ASN00008113) station
>>        by selecting   :)
>>        (: the weather readings for December 25 over the last 10 years.
>>                       :)
>>        let $collection := "/tmp/1.0_partition_ghcnd_all_xml/sensors"
>>        for $r in collection($collection)/dataCollection/data
>>        let $date := xs:date(fn:substring(xs:string(fn:data($r/date)),
>>        0, 11))
>>        where $r/station eq "GHCND:ASN00008113"
>>             and fn:year-from-date($date) >= (2003)
>>             and fn:month-from-date($date) eq 12
>>             and fn:day-from-date($date) eq 25
>>        return $r
>> 
>>        Saxon processed this query 10 times in 35.936s with an average
>>        of 3.5936s per query.
>>        VXQuery processed this query 10 times in 504.715s with an
>>        average of 50.4715s per query.
>> 
>>        I ran the query again with out the date filter options. The
>>        query returns all data from station GHCND:ASN00008113.
>>        Saxon processed this query 10 times in 35.953s with an average
>>        of 3.5953s per query.
>>        VXQuery processed this query 10 times in 376.325s with an
>>        average of 37.6325s per query.
>> 
>>        The below modified query takes an average of 4.0028s. The query
>>        basically touches each sensor reading but does nothing. The
>>        select is much simpler and the plan does not have two subplans
>>        for paths steps used in the select.
>> 
>>        let $collection := "/tmp/1.0_partition_ghcnd_all_xml/sensors/ASN"
>>        for $r in collection($collection)/dataCollection/data
>>        where empty($r)
>>        return $r
>> 
>>        The process seems to take a lot of time to prepare data and then
>>        execute the select for the where clause.
>> 
>>        Notes on VXQuery performance:
>>        ========================
>>        The frame size was set to 1 MB.
>>        The cpu is at 100% to 260% on a 8 core machine. (100% is one
>>        core is being fully used)
>>        The disk has sporadic activity.
>>        The system has one cluster controller and one node controller
>>        set up from inside the CLI script.
>> 
>>        Suggested Options:
>>        1. Remove the subplans for path steps going into the select.
>>             * The subplan iterates over a field created by an unnest
>>        operator. The unnest operator is guaranteed to produce single
>>        value items. The subplan is not required when the input is a
>>        single item that gets iterated over then result aggregated back
>>        together. The process could be a simple assign for the value
>>        inside the aggregate (including the rest of the nested plan
>>        operators minus the unnest).
>>        2. Project unused variables out of the tuple during local
>>        execution.
>>             * Depends on how the tuples are being passes between
>>        operators. Right now a lot of information is stored in the tuple
>>        (XML file, all path steps, etc.). Reducing the size could help
>>        with coping less information during each new path step.
>> 
>>        Questions?
>>        * Can you track to see which operators are taking the longest?
>>        * Can you explain the tuple stream and how it interacts with
>>        each operator? Is there one stream? Does it only grow or change
>>        size at each operator?
>> 
>> 
>>        On Mon, Dec 2, 2013 at 8:14 PM, Vinayak Borkar
>>        <[email protected] <mailto:[email protected]>> wrote:
>> 
>>            Preston,
>> 
>>            Let me suggest a way to track down our performance issues in
>>            VXQuery. Let's approach our queries one at a time. First,
>>            let's start with the single collection, scan-based queries
>>            and reason about their performance in comparison to Saxon.
>>            As an even smaller goal, can you take your first query and
>>            report running times on the 250MB of data alongwith Saxon's
>>            running times?
>> 
>>            Thanks,
>>            Vinayak
>> 
>> 
>> 
>> 
>>            On 11/29/13, 12:48 PM, Eldon Carman wrote:
>> 
>>                The query plans are so big, I attached a document with
>>                the queries and
>>                plans.
>> 
>> 
>>                On Wed, Nov 27, 2013 at 8:53 PM, Vinayak Borkar
>>                <[email protected] <mailto:[email protected]>
>>                <mailto:[email protected] <mailto:[email protected]>>>
>>                wrote:
>> 
>>                     Preston,
>> 
>>                     For each query, please send the following:
>> 
>>                     1. The query
>>                     2. The translated logical plan
>>                     3. The optimized physical plan
>> 
>>                     Thanks,
>>                     Vinayak
>> 
>> 
>> 
>>                     On 11/27/13, 8:16 PM, Eldon Carman wrote:
>> 
>>                         It appears that our query process is taking
>>                longer than
>>                         expected. I have
>>                         created a small set of sensors to test our
>>                benchmark queries.
>>                         The data set
>>                         is about 250 MB and the queries execute in 10
>>                to 20 seconds with
>>                         the SAXON
>>                         XSLT processor. When I tried a few of the
>>                queries on VXQuery,
>>                         the process
>>                         ran for one hour and still did not complete. I
>>                am now looking
>>                         into where
>>                         the time is being spent for our query and see
>>                why its taking so
>>                         long.
> 

Reply via email to