Re: Delayed hotspot optimizations in Spark

Sean Owen Fri, 10 Oct 2014 00:57:07 -0700

You could try setting "-Xcomp" for executors to force JIT compilation
upfront. I don't know if it's a good idea overall but might show
whether the upfront compilation really helps. I doubt it.


However is this almost surely due to caching somewhere, in Spark SQL
or HDFS? I really doubt hotspot makes a difference compared to these
much larger factors.

On Fri, Oct 10, 2014 at 8:49 AM, Alexey Romanchuk
<alexey.romanc...@gmail.com> wrote:
> Hello spark users and developers!
>
> I am using hdfs + spark sql + hive schema + parquet as storage format. I
> have lot of parquet files - one files fits one hdfs block for one day. The
> strange thing is very slow first query for spark sql.
>
> To reproduce situation I use only one core and I have 97sec for first time
> and only 13sec for all next queries. Sure I query for different data, but it
> has same structure and size. The situation can be reproduced after restart
> thrift server.
>
> Here it information about parquet files reading from worker node:
>
> Slow one:
> Oct 10, 2014 2:26:53 PM INFO: parquet.hadoop.InternalParquetRecordReader:
> Assembled and processed 1560251 records from 30 columns in 11686 ms:
> 133.51454 rec/ms, 4005.4363 cell/ms
>
> Fast one:
> Oct 10, 2014 2:31:30 PM INFO: parquet.hadoop.InternalParquetRecordReader:
> Assembled and processed 1568899 records from 1 columns in 1373 ms: 1142.6796
> rec/ms, 1142.6796 cell/ms
>
> As you can see second reading is 10x times faster then first. Most of the
> query time spent to work with parquet file.
>
> This problem is really annoying, because most of my spark task contains just
> 1 sql query and data processing and to speedup my jobs I put special warmup
> query in from of any job.
>
> My assumption is that it is hotspot optimizations that used due first
> reading. Do you have any idea how to confirm/solve this performance problem?
>
> Thanks for advice!
>
> p.s. I have billion hotspot optimization showed with -XX:+PrintCompilation
> but can not figure out what are important and what are not.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Delayed hotspot optimizations in Spark

Reply via email to