Re: [SparkSQL] SparkSQL performance on small TPCDS tables is very low when compared to Drill or Presto

2018-03-31 Thread Tin Vu
omplementary and different used > cases. Have you tried using JDBC connector to Drill from within SPARKSQL? > > Regards, > Gourav Sengupta > > > On Thu, Mar 29, 2018 at 1:03 AM, Tin Vu <tvu...@ucr.edu> wrote: > >> Hi, >> >> I am executing a benchmark to com

Re: [SparkSQL] SparkSQL performance on small TPCDS tables is very low when compared to Drill or Presto

2018-03-29 Thread Tin Vu
even though there are less records. Creation and > distribution of tasks has a noticeable overhead on smaller datasets. > > > > You might want to look at the driver logs, or the Spark Application Detail > UI. > > > > *From: *Tin Vu <tvu...@ucr.edu> > *Date:

Re: [SparkSQL] SparkSQL performance on small TPCDS tables is very low when compared to Drill or Presto

2018-03-28 Thread Tin Vu
see that you don’t do anything in the > query and immediately return (similarly count might immediately return by > using some statistics). > > On 29. Mar 2018, at 02:03, Tin Vu <tvu...@ucr.edu> wrote: > > Hi, > > I am executing a benchmark to compare performance of SparkS

[SparkSQL] SparkSQL performance on small TPCDS tables is very low when compared to Drill or Presto

2018-03-28 Thread Tin Vu
Hi, I am executing a benchmark to compare performance of SparkSQL, Apache Drill and Presto. My experimental setup: - TPCDS dataset with scale factor 100 (size 100GB). - Spark, Drill, Presto have a same number of workers: 12. - Each worked has same allocated amount of memory: 4GB. -