Saxon and compression: Saxon's website does not have any documentation for using xml.gz files. When attempting to parse the xml document a FODC0002 error is shown. Basically saying it can not parse the file. Do you know another way?
Compressed vs Uncompressed: Otherwise it looks like to be consistent, we should only use uncompressed. In addition, the overhead of uncompressing the XML document is adding to the query time. Do you agree we should use only uncompressed? Updated Format of Results: Here is a new format. I was wondering about how the mailing list would handle the table. I retested VXQuery XML for q00 500mb. The updated value shows we are consistently slower with xml.gz. Note how saxon throughput improves with the larger data set, while VXQuery basically stays the same with a mild improvement. Query q00 (50mb) --------------- 0m36.094s Saxon 1m38.698s VXQuery xml 1m43.571s VXQuery xml.gz Query q00 (500mb) --------------- 2m11.937s Saxon 14m36.144s VXQuery xml 15m14.088s VXQuery xml.gz Query q01 (50mb) --------------- 0m36.094s Saxon 0m56.064s VXQuery xml 1m04.428s VXQuery xml.gz Query q01 (500mb) --------------- 2m07.096s Saxon 8m03.735s VXQuery xml 8m52.476s VXQuery xml.gz Throughput: Saxon (both queries) ~1.38 mb/s to ~3.8 mb/s VXQuery (q00) ~0.51 mb/s to ~0.57 mb/s VXQuery (q01) ~0.89 mb/s to ~1.03 mb/s On Thu, Jan 16, 2014 at 10:44 AM, Till Westmann <[email protected]> wrote: > Hi Preston, > > Thanks for the update. > The table is a little hard to digest. Could you try to format it such that > it works well in plain text? > > I think that it's fine to work on compressed data and I think that Saxon > will decompress it before processing. > But if we have both options (compressed and uncompressed) we should either > use one consistently for all measurements or (even better) provide numbers > for both version to see what impact compression has on both engines. > > Does this make sense? > > Cheers, > Till > > On Jan 16, 2014, at 9:51 AM, Eldon Carman <[email protected]> wrote: > > > The benchmark queries have all been run through saxon my local machine to > > get base numbers. I used two data sizes: ~50mb and ~500mb. For VXQuery I > > ran the same queries on the same data set with two versions: uncompressed > > and gz compressed XML documents. The compressed version has the same > number > > of files, but each file is compressed. > > > > QueryData SizeSaxonVXQuery xmlVXQuery > xml.gzq00500m36.094s1m38.698s1m43.571s > > q01500m36.094s0m56.064s1m4.428s > > QueryData SizeSaxonVXQuery xmlVXQuery xml.gzq005002m11.937s15m35.637 > > 15m14.088sq015002m7.096s8m3.735s8m52.476s > > Question: > > We talked about testing saxon on the larger data set that was 3 to 5 > times > > memory. Currently the larger data set is store in the compressed format. > > Shall I work on creating an uncompressed version for the testing Saxon? > > > > Other Information: > > q00 and q01 are both filter based queries. The rest of the queries are > > showing errors. I will dig into each of those today and send out what I > > find. In addition I will start the larger scale test for q00 and q01 for > > reference. > > > > Preston > >
