Re: VXQuery Benchmarks Update (q00, q01)

Vassilis Tsotras Thu, 16 Jan 2014 12:15:19 -0800

I think we should stay with the uncompressed files (at least in thebeginning) to be consistent.


Vassilis


On 1/16/2014 11:56 AM, Eldon Carman wrote:

Saxon and compression:

Saxon's website does not have any documentation for using xml.gzfiles. When attempting to parse the xml document a FODC0002 error isshown. Basically saying it can not parse the file. Do you know anotherway?


Compressed vs Uncompressed:

Otherwise it looks like to be consistent, we should only useuncompressed. In addition, the overhead of uncompressing the XMLdocument is adding to the query time. Do you agree we should use onlyuncompressed?


Updated Format of Results:

Here is a new format. I was wondering about how the mailing list wouldhandle the table. I retested VXQuery XML for q00 500mb. The updatedvalue shows we are consistently slower with xml.gz. Note how saxonthroughput improves with the larger data set, while VXQuery basicallystays the same with a mild improvement.


Query q00 (50mb)
---------------
0m36.094s Saxon
1m38.698s VXQuery xml
1m43.571s VXQuery xml.gz

Query q00 (500mb)
---------------
2m11.937s Saxon
14m36.144s VXQuery xml
15m14.088s VXQuery xml.gz

Query q01 (50mb)
---------------
0m36.094s Saxon
0m56.064s VXQuery xml
1m04.428s VXQuery xml.gz

Query q01 (500mb)
---------------
2m07.096s Saxon
8m03.735s VXQuery xml
8m52.476s VXQuery xml.gz

Throughput:
Saxon (both queries) ~1.38 mb/s to ~3.8 mb/s
VXQuery (q00) ~0.51 mb/s to ~0.57 mb/s
VXQuery (q01) ~0.89 mb/s to ~1.03 mb/s

On Thu, Jan 16, 2014 at 10:44 AM, Till Westmann <[email protected]<mailto:[email protected]>> wrote:


    Hi Preston,

    Thanks for the update.
    The table is a little hard to digest. Could you try to format it
    such that it works well in plain text?

    I think that it's fine to work on compressed data and I think that
    Saxon will decompress it before processing.
    But if we have both options (compressed and uncompressed) we
    should either use one consistently for all measurements or (even
    better) provide numbers for both version to see what impact
    compression has on both engines.

    Does this make sense?

    Cheers,
    Till

    On Jan 16, 2014, at 9:51 AM, Eldon Carman <[email protected]
    <mailto:[email protected]>> wrote:

    > The benchmark queries have all been run through saxon my local
    machine to
    > get base numbers. I used two data sizes: ~50mb and ~500mb. For
    VXQuery I
    > ran the same queries on the same data set with two versions:
    uncompressed
    > and gz compressed XML documents. The compressed version has the
    same number
    > of files, but each file is compressed.
    >
    > QueryData SizeSaxonVXQuery xmlVXQuery
    xml.gzq00500m36.094s1m38.698s1m43.571s
    > q01500m36.094s0m56.064s1m4.428s
    > QueryData SizeSaxonVXQuery xmlVXQuery xml.gzq005002m11.937s15m35.637
    > 15m14.088sq015002m7.096s8m3.735s8m52.476s
    > Question:
    > We talked about testing saxon on the larger data set that was 3
    to 5 times
    > memory. Currently the larger data set is store in the compressed
    format.
    > Shall I work on creating an uncompressed version for the testing
    Saxon?
    >
    > Other Information:
    > q00 and q01 are both filter based queries. The rest of the
    queries are
    > showing errors. I will dig into each of those today and send out
    what I
    > find. In addition I will start the larger scale test for q00 and
    q01 for
    > reference.
    >
    > Preston

Re: VXQuery Benchmarks Update (q00, q01)

Reply via email to