Hi, I am trying to get a sample of a sql query in to make the query run faster. My query look like this : SELECT `Category` as `Category`,sum(`bookings`) as `bookings`,sum(`dealviews`) as `dealviews` FROM groupon_dropbox WHERE `event_date` >= '2015-11-14' AND `event_date` <= '2016-02-19' GROUP BY `Category` LIMIT 100
The table is partitioned by event_date. And the code I am using is: df = self.df_from_sql(sql, srcs) results = df.sample(False, 0.5).collect() The results are a little bit different, but the execution time is almost the same. Am I missing something? thanks