I do not know your data, but it looks that you have too many partitions for such a small data set.
> On 26 Apr 2016, at 00:47, Imran Akbar <skunkw...@gmail.com> wrote: > > Hi, > > I'm running a simple query like this through Spark SQL: > > sqlContext.sql("SELECT MIN(age) FROM data WHERE country = 'GBR' AND > dt_year=2015 AND dt_month BETWEEN 1 AND 11 AND product IN ('cereal')").show() > > which takes 3 minutes to run against an in-memory cache of 9 GB of data. > > The data was 100% cached in memory before I ran the query (see screenshot 1). > The data was cached like this: > data = sqlContext.sql("SELECT * FROM raw WHERE (dt_year=2015 OR > dt_year=2016)") > data.cache() > data.registerTempTable("data") > and then I ran an action query to load the data into the cache. > > I see lots of rows of logs like this: > 16/04/25 22:39:11 INFO MemoryStore: Block rdd_13136_2856 stored as values in > memory (estimated size 2.5 MB, free 9.7 GB) > 16/04/25 22:39:11 INFO BlockManager: Found block rdd_13136_2856 locally > 16/04/25 22:39:11 INFO MemoryStore: 6 blocks selected for dropping > 16/04/25 22:39:11 INFO BlockManager: Dropping block rdd_13136_3866 from memory > > Screenshot 2 shows the job page of the longest job. > > The data was partitioned in Parquet by month, country, and product before I > cached it. > > Any ideas what the issue could be? This is running on localhost. > > regards, > imran > <Screen Shot 2016-04-25 at 3.43.03 PM.png> > <Screen Shot 2016-04-25 at 3.42.15 PM.png> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org