You could try setting "-Xcomp" for executors to force JIT compilation upfront. I don't know if it's a good idea overall but might show whether the upfront compilation really helps. I doubt it.
However is this almost surely due to caching somewhere, in Spark SQL or HDFS? I really doubt hotspot makes a difference compared to these much larger factors. On Fri, Oct 10, 2014 at 8:49 AM, Alexey Romanchuk <alexey.romanc...@gmail.com> wrote: > Hello spark users and developers! > > I am using hdfs + spark sql + hive schema + parquet as storage format. I > have lot of parquet files - one files fits one hdfs block for one day. The > strange thing is very slow first query for spark sql. > > To reproduce situation I use only one core and I have 97sec for first time > and only 13sec for all next queries. Sure I query for different data, but it > has same structure and size. The situation can be reproduced after restart > thrift server. > > Here it information about parquet files reading from worker node: > > Slow one: > Oct 10, 2014 2:26:53 PM INFO: parquet.hadoop.InternalParquetRecordReader: > Assembled and processed 1560251 records from 30 columns in 11686 ms: > 133.51454 rec/ms, 4005.4363 cell/ms > > Fast one: > Oct 10, 2014 2:31:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: > Assembled and processed 1568899 records from 1 columns in 1373 ms: 1142.6796 > rec/ms, 1142.6796 cell/ms > > As you can see second reading is 10x times faster then first. Most of the > query time spent to work with parquet file. > > This problem is really annoying, because most of my spark task contains just > 1 sql query and data processing and to speedup my jobs I put special warmup > query in from of any job. > > My assumption is that it is hotspot optimizations that used due first > reading. Do you have any idea how to confirm/solve this performance problem? > > Thanks for advice! > > p.s. I have billion hotspot optimization showed with -XX:+PrintCompilation > but can not figure out what are important and what are not. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org