I have been working on getting numbers for our performance study on VXQuery. The study includes eight queries we have identified in the areas of filtering, aggregation and joins over the NOAA GHCN-Daily weather data. The query times for local execution appear to be pretty consistent. When running the queries on a five node cluster, the self join query times were not as expected. After drilling down into the data, I found the data was not properly spread out on the cluster. I am in the process of fixing the data on each node and will run the queries again. I am hopeful that this will resolve all the known issue with the numbers for the 10 gb study.
