I got it, thanks for that 2015-07-26 17:21 GMT+08:00 Paolo Platter <paolo.plat...@agilelab.it>:
> If you want a performance boost, you need to load the full table in > memory using caching and them execute your query directly on cached > dataframe. Otherwise you use spark only as a bridge and you don't leverage > the distributed in memory engine of spark. > > Paolo > > Inviata dal mio Windows Phone > ------------------------------ > Da: Louis Hust <louis.h...@gmail.com> > Inviato: 26/07/2015 10:28 > A: Shixiong Zhu <zsxw...@gmail.com> > Cc: Jerrick Hoang <jerrickho...@gmail.com>; user@spark.apache.org > Oggetto: Re: Spark is much slower than direct access MySQL > > Thanks for your explain > > 2015-07-26 16:22 GMT+08:00 Shixiong Zhu <zsxw...@gmail.com>: > >> Oh, I see. That's the total time of executing a query in Spark. Then the >> difference is reasonable, considering Spark has much more work to do, e.g., >> launching tasks in executors. >> >> Best Regards, >> Shixiong Zhu >> >> 2015-07-26 16:16 GMT+08:00 Louis Hust <louis.h...@gmail.com>: >> >>> Look at the given url: >>> >>> Code can be found at: >>> >>> >>> https://github.com/louishust/sparkDemo/blob/master/src/main/java/DirectQueryTest.java >>> >>> 2015-07-26 16:14 GMT+08:00 Shixiong Zhu <zsxw...@gmail.com>: >>> >>>> Could you clarify how you measure the Spark time cost? Is it the total >>>> time of running the query? If so, it's possible because the overhead of >>>> Spark dominates for small queries. >>>> >>>> Best Regards, >>>> Shixiong Zhu >>>> >>>> 2015-07-26 15:56 GMT+08:00 Jerrick Hoang <jerrickho...@gmail.com>: >>>> >>>>> how big is the dataset? how complicated is the query? >>>>> >>>>> On Sun, Jul 26, 2015 at 12:47 AM Louis Hust <louis.h...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi, all, >>>>>> >>>>>> I am using spark DataFrame to fetch small table from MySQL, >>>>>> and i found it cost so much than directly access MySQL Using JDBC. >>>>>> >>>>>> Time cost for Spark is about 2033ms, and direct access at >>>>>> about 16ms. >>>>>> >>>>>> Code can be found at: >>>>>> >>>>>> >>>>>> https://github.com/louishust/sparkDemo/blob/master/src/main/java/DirectQueryTest.java >>>>>> >>>>>> So If my configuration for spark is wrong? How to optimise Spark to >>>>>> achieve the similar performance like direct access? >>>>>> >>>>>> Any idea will be appreciated! >>>>>> >>>>>> >>>> >>> >> >