The profiler tweaks and code changes have allowed me to identify some of the slower areas of VXQuery. The changes have been committed in my branch on git. The new branch performs 40% better than the previous round of tests on ~500MB data set running filter and aggregate queries.
In addition to running faster, VXQuery supports local partitioning. A 45% decrease in query time is seen when going from one partition to two partitions. A 62% decrease in query time is seen when going from on partition to four partitions. VXQuery's performance almost double using two partitions. Four partition almost triples the performance. The gain for more partition starts to diminish. The speed improvement is related to a relative increase in cpu utilization during the queries execution. Here are results for the new VXQuery version compared to saxon: Query q00 (500mb) --------------- 2m11.937s Saxon 9m07.037s VXQuery - 1 partition 4m56.224s VXQuery - 2 partitions 3m28.340s VXQuery - 4 partitions Query q01 (500mb) --------------- 2m07.096s Saxon 5m30.705s VXQuery - 1 partition 2m53.382s VXQuery - 2 partitions 1m58.667s VXQuery - 4 partitions Query q02 (500mb) --------------- 2m11.029s Saxon 8m17.377s VXQuery - 1 partition 4m34.760s VXQuery - 2 partitions 3m09.778s VXQuery - 4 partitions Query q03 (500mb) --------------- 1m58.784s Saxon 5m55.061s VXQuery - 1 partition 3m05.709s VXQuery - 2 partitions 2m08.478s VXQuery - 4 partitions
