We have seen all kinds of results published that often contradict each other. 
My take is that the authors often know more tricks about how to tune their 
own/familiar products than the others. So the product on focus is tuned for 
ideal performance while the competitors are not. The authors are not 
necessarily biased but as a consequence the results are.

Ideally it’s critical for the user community to be informed of all the in-depth 
tuning tricks of all products. However, realistically, there is a big gap in 
terms of documentation. Hope the Spark folks will make a difference. :-)

Du


From: Soumya Simanta <soumya.sima...@gmail.com<mailto:soumya.sima...@gmail.com>>
Date: Friday, October 31, 2014 at 4:04 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: SparkSQL performance

I was really surprised to see the results here, esp. SparkSQL "not completing"
http://www.citusdata.com/blog/86-making-postgresql-scale-hadoop-style

I was under the impression that SparkSQL performs really well because it can 
optimize the RDD operations and load only the columns that are required. This 
essentially means in most cases SparkSQL should be as fast as Spark is.

I would be very interested to hear what others in the group have to say about 
this.

Thanks
-Soumya


Reply via email to