Re: Vectorized ORC Reader in Apache Spark 2.3 with Apache ORC 1.4.1.

Nicolas Paris Sun, 28 Jan 2018 04:16:12 -0800

Hi

Thanks for this work.


Will this affect both:
1) spark.read.format("orc").load("...")
2) spark.sql("select ... from my_orc_table_in_hive")

?


Le 10 janv. 2018 à 20:14, Dongjoon Hyun écrivait :
> Hi, All.
> 
> Vectorized ORC Reader is now supported in Apache Spark 2.3.
> 
>     https://issues.apache.org/jira/browse/SPARK-16060
> 
> It has been a long journey. From now, Spark can read ORC files faster without
> feature penalty.
> 
> Thank you for all your support, especially Wenchen Fan.
> 
> It's done by two commits.
> 
>     [SPARK-16060][SQL] Support Vectorized ORC Reader
>     https://github.com/apache/spark/commit/f44ba910f58083458e1133502e193a
> 9d6f2bf766
> 
>     [SPARK-16060][SQL][FOLLOW-UP] add a wrapper solution for vectorized orc
> reader
>     https://github.com/apache/spark/commit/eaac60a1e20e29084b7151ffca964c
> faa5ba99d1
> 
> Please check OrcReadBenchmark for the final speed-up from `Hive built-in ORC`
> to `Native ORC Vectorized`.
> 
>     https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/
> apache/spark/sql/hive/orc/OrcReadBenchmark.scala
> 
> Thank you.
> 
> Bests,
> Dongjoon.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Vectorized ORC Reader in Apache Spark 2.3 with Apache ORC 1.4.1.

Reply via email to