I think we should stay with the uncompressed files (at least in the beginning) to be consistent.

Vassilis

On 1/16/2014 11:56 AM, Eldon Carman wrote:
Saxon and compression:
Saxon's website does not have any documentation for using xml.gz files. When attempting to parse the xml document a FODC0002 error is shown. Basically saying it can not parse the file. Do you know another way?

Compressed vs Uncompressed:
Otherwise it looks like to be consistent, we should only use uncompressed. In addition, the overhead of uncompressing the XML document is adding to the query time. Do you agree we should use only uncompressed?

Updated Format of Results:
Here is a new format. I was wondering about how the mailing list would handle the table. I retested VXQuery XML for q00 500mb. The updated value shows we are consistently slower with xml.gz. Note how saxon throughput improves with the larger data set, while VXQuery basically stays the same with a mild improvement.

Query q00 (50mb)
---------------
0m36.094s Saxon
1m38.698s VXQuery xml
1m43.571s VXQuery xml.gz

Query q00 (500mb)
---------------
2m11.937s Saxon
14m36.144s VXQuery xml
15m14.088s VXQuery xml.gz

Query q01 (50mb)
---------------
0m36.094s Saxon
0m56.064s VXQuery xml
1m04.428s VXQuery xml.gz

Query q01 (500mb)
---------------
2m07.096s Saxon
8m03.735s VXQuery xml
8m52.476s VXQuery xml.gz

Throughput:
Saxon (both queries) ~1.38 mb/s to ~3.8 mb/s
VXQuery (q00) ~0.51 mb/s to ~0.57 mb/s
VXQuery (q01) ~0.89 mb/s to ~1.03 mb/s


On Thu, Jan 16, 2014 at 10:44 AM, Till Westmann <[email protected] <mailto:[email protected]>> wrote:

    Hi Preston,

    Thanks for the update.
    The table is a little hard to digest. Could you try to format it
    such that it works well in plain text?

    I think that it's fine to work on compressed data and I think that
    Saxon will decompress it before processing.
    But if we have both options (compressed and uncompressed) we
    should either use one consistently for all measurements or (even
    better) provide numbers for both version to see what impact
    compression has on both engines.

    Does this make sense?

    Cheers,
    Till

    On Jan 16, 2014, at 9:51 AM, Eldon Carman <[email protected]
    <mailto:[email protected]>> wrote:

    > The benchmark queries have all been run through saxon my local
    machine to
    > get base numbers. I used two data sizes: ~50mb and ~500mb. For
    VXQuery I
    > ran the same queries on the same data set with two versions:
    uncompressed
    > and gz compressed XML documents. The compressed version has the
    same number
    > of files, but each file is compressed.
    >
    > QueryData SizeSaxonVXQuery xmlVXQuery
    xml.gzq00500m36.094s1m38.698s1m43.571s
    > q01500m36.094s0m56.064s1m4.428s
    > QueryData SizeSaxonVXQuery xmlVXQuery xml.gzq005002m11.937s15m35.637
    > 15m14.088sq015002m7.096s8m3.735s8m52.476s
    > Question:
    > We talked about testing saxon on the larger data set that was 3
    to 5 times
    > memory. Currently the larger data set is store in the compressed
    format.
    > Shall I work on creating an uncompressed version for the testing
    Saxon?
    >
    > Other Information:
    > q00 and q01 are both filter based queries. The rest of the
    queries are
    > showing errors. I will dig into each of those today and send out
    what I
    > find. In addition I will start the larger scale test for q00 and
    q01 for
    > reference.
    >
    > Preston



Reply via email to