Re: Faster Spark on ORC with Apache ORC

2017-07-13 Thread Jeff Zhang
pache Spark 2.3? > > > > Bests, > > Dongjoon. > > > > *From: *Dong Joon Hyun > > > *Date: *Tuesday, May 9, 2017 at 6:15 PM > *To: *"dev@spark.apache.org" > *Subject: *Faster Spark on ORC with Apache ORC > > > > Hi, All. > &

Re: Faster Spark on ORC with Apache ORC

2017-07-11 Thread Dong Joon Hyun
order to improve Apache Spark 2.3? Bests, Dongjoon. From: Dong Joon Hyun Date: Tuesday, May 9, 2017 at 6:15 PM To: "dev@spark.apache.org" Subject: Faster Spark on ORC with Apache ORC Hi, All. Apache Spark always has been a fast and general engine, and since SPARK-2883, Spark suppo

Re: Faster Spark on ORC with Apache ORC

2017-05-14 Thread Dong Joon Hyun
AM To: dev@spark.apache.org Subject: Re: Faster Spark on ORC with Apache ORC Hi, I have been wondering how much Apache Spark 2.2.0 will be improved more again. This is the prior record from the source code. Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz SQL Single Int Column Scan:

Re: Faster Spark on ORC with Apache ORC

2017-05-12 Thread Dong Joon Hyun
. Bests, Dongjoon. From: Dongjoon Hyun mailto:dh...@hortonworks.com>> Date: Tuesday, May 9, 2017 at 6:15 PM To: "dev@spark.apache.org<mailto:dev@spark.apache.org>" mailto:dev@spark.apache.org>> Subject: Faster Spark on ORC with Apache ORC Hi, All. Apache Spar

Faster Spark on ORC with Apache ORC

2017-05-09 Thread Dong Joon Hyun
Hi, All. Apache Spark always has been a fast and general engine, and since SPARK-2883, Spark supports Apache ORC inside `sql/hive` module with Hive dependency. With Apache ORC 1.4.0 (released yesterday), we can make Spark on ORC faster and get some benefits. - Speed: Use both Spark `Column