[FYI] Spark 2.2 on spark2:2.6-maint

2017-08-23 Thread Dong Joon Hyun
Hi, All. It seems that Bikas is too busy to announce this. 1. Spark2:2.6-maint becomes Spark 2.2 in this morning. Thank you all. According to Weiqing, RE will update the version as soon as possible. So far, it’s 2.2.0 instead of 2.2.0-2.6.3.0-XX. 1. The existing Jenkins works

Re: Increase Timeout or optimize Spark UT?

2017-08-20 Thread Dong Joon Hyun
+1 for any efforts to recover Jenkins! Thank you for the direction. Bests, Dongjoon. From: Reynold Xin <r...@databricks.com> Date: Sunday, August 20, 2017 at 5:53 PM To: Dong Joon Hyun <dh...@hortonworks.com> Cc: "dev@spark.apache.org" <dev@spark.apache.org> S

Increase Timeout or optimize Spark UT?

2017-08-20 Thread Dong Joon Hyun
Hi, All. Recently, Apache Spark master branch test (SBT with hadoop-2.7 / 2.6) has been hitting the build timeout. Please see the build time trend. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/buildTimeTrend All recent 22

Re: spark pypy support?

2017-08-14 Thread Dong Joon Hyun
Hi, Tom. What version of PyPy do you use? In the Jenkins environment, `pypy` always passes like Python 2.7 and Python 3.4. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull

Re: Use Apache ORC in Apache Spark 2.3

2017-08-10 Thread Dong Joon Hyun
com> Date: Thursday, August 10, 2017 at 3:23 PM To: Andrew Ash <and...@andrewash.com> Cc: Dong Joon Hyun <dh...@hortonworks.com>, "dev@spark.apache.org" <dev@spark.apache.org>, Apache Spark PMC <priv...@spark.apache.org> Subject: Re: Use Apache ORC in Apac

Re: Use Apache ORC in Apache Spark 2.3

2017-08-10 Thread Dong Joon Hyun
in Apache Spark. Bests, Dongjoon. From: Dong Joon Hyun <dh...@hortonworks.com> Date: Friday, August 4, 2017 at 8:05 AM To: "dev@spark.apache.org" <dev@spark.apache.org> Cc: Apache Spark PMC <priv...@spark.apache.org> Subject: Use Apache ORC in Apache Spark 2.3 Hi, All.

Re: Use Apache ORC in Apache Spark 2.3

2017-08-04 Thread Dong Joon Hyun
Thank you so much, Owen! Bests, Dongjoon. From: Owen O'Malley <owen.omal...@gmail.com> Date: Friday, August 4, 2017 at 9:59 AM To: Dong Joon Hyun <dh...@hortonworks.com> Cc: "dev@spark.apache.org" <dev@spark.apache.org>, Apache Spark PMC <priv...@spark.apache.o

Use Apache ORC in Apache Spark 2.3

2017-08-04 Thread Dong Joon Hyun
Hi, All. Apache Spark always has been a fast and general engine, and supports Apache ORC inside `sql/hive` module with Hive dependency since Spark 1.4.X (SPARK-2883). However, there are many open issues about `Feature parity for ORC with Parquet (SPARK-20901)` as of today. With new Apache ORC

Re: [VOTE] [SPIP] SPARK-18085: Better History Server scalability

2017-08-01 Thread Dong Joon Hyun
+1 (non-binding) Dongjoon. From: Ryan Blue Reply-To: "rb...@netflix.com" Date: Tuesday, August 1, 2017 at 9:06 AM To: Tom Graves Cc: Marcelo Vanzin , "dev@spark.apache.org"

Re: Tests failing with run-tests.py SyntaxError

2017-07-28 Thread Dong Joon Hyun
I saw that error in the latest branch-2.1 build failure, too. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.1-test-sbt-hadoop-2.7/579/console But, the code was written in Jan 2016. Didn’t we run it on Python 2.6 without any problem? ee74498de37

Re: Faster Spark on ORC with Apache ORC

2017-07-11 Thread Dong Joon Hyun
this in order to improve Apache Spark 2.3? Bests, Dongjoon. From: Dong Joon Hyun <dh...@hortonworks.com> Date: Tuesday, May 9, 2017 at 6:15 PM To: "dev@spark.apache.org" <dev@spark.apache.org> Subject: Faster Spark on ORC with Apache ORC Hi, All. Apache Spark always has been a f

Re: [VOTE] Apache Spark 2.2.0 (RC6)

2017-07-05 Thread Dong Joon Hyun
+1 (non binding) Bests, Dongjoon. From: on behalf of Holden Karau Date: Wednesday, July 5, 2017 at 10:14 PM To: Felix Cheung Cc: Denny Lee , Liang-Chi Hsieh ,

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-09 Thread Dong Joon Hyun
Hi, Nick. Could you give us more information on your environment like R/JDK/OS? Bests, Dongjoon. From: Nick Pentreath Date: Friday, June 9, 2017 at 1:12 AM To: dev Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) All Scala, Python tests pass. ML QA

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-06 Thread Dong Joon Hyun
+1 (non-binding) I built and tested on CentOS 7.3.1611 / OpenJDK 1.8.131 / R 3.3.3 with “-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver –Psparkr”. Java/Scala/R tests passed as expected. There are two minor things. 1. For the deprecation documentation issue

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-05 Thread Dong Joon Hyun
Hi, Michael. Can we be more clear on deprecation messages in 2.2.0-RC4 documentation? > Spark runs on Java 8+, Python 2.6+/3.4+ and R 3.1+. -> Python 2.7+ ? https://issues.apache.org/jira/browse/SPARK-12661 (Status: `Open`, Target Version: `2.2.0`, Label: `ReleaseNotes`) > Note that

Re: Spark Issues on ORC

2017-06-02 Thread Dong Joon Hyun
Thank you for confirming, Steve. I removes the dependency of SPARK-20799 on SPARK-20901. Bests, Dongjoon. From: Steve Loughran <ste...@hortonworks.com> Date: Friday, June 2, 2017 at 4:42 AM To: Dong Joon Hyun <dh...@hortonworks.com> Cc: Apache Spark Dev <dev@spark.apache.org>

Spark Issues on ORC

2017-05-26 Thread Dong Joon Hyun
Hi, All. Today, while I’m looking over JIRA issues for Spark 2.2.0 in Apache Spark. I noticed that there are many unresolved community requests and related efforts over `Feature parity for ORC with Parquet`. Some examples I found are the following. I created SPARK-20901 to organize these

Re: [Spark SQL] ceil and floor functions on doubles

2017-05-19 Thread Dong Joon Hyun
Hi, Anton. It’s the same result with Hive, isn’t it? hive> select 9.223372036854786E20, ceil(9.223372036854786E20); OK _c0 _c1 9.223372036854786E20 9223372036854775807 Time taken: 2.041 seconds, Fetched: 1 row(s) Bests, Dongjoon. From: Anton Okolnychyi

Re: Faster Spark on ORC with Apache ORC

2017-05-14 Thread Dong Joon Hyun
the previous PR is on-going, new PR inevitably have some of the previous PR. I'll remove the duplication later in any ways. Any opinions for Spark ORC improvement are welcome! Thanks, Dongjoon.? From: Dong Joon Hyun <dh...@hortonworks.com> Sent: Friday,

Re: Faster Spark on ORC with Apache ORC

2017-05-12 Thread Dong Joon Hyun
Hi, I have been wondering how much Apache Spark 2.2.0 will be improved more again. This is the prior record from the source code. Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz SQL Single Int Column Scan: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative

Faster Spark on ORC with Apache ORC

2017-05-09 Thread Dong Joon Hyun
Hi, All. Apache Spark always has been a fast and general engine, and since SPARK-2883, Spark supports Apache ORC inside `sql/hive` module with Hive dependency. With Apache ORC 1.4.0 (released yesterday), we can make Spark on ORC faster and get some benefits. - Speed: Use both Spark

Re: [VOTE] Apache Spark 2.1.1 (RC4)

2017-04-27 Thread Dong Joon Hyun
+1 I’ve got the same result (Scala/R test) on JDK 1.8.0_131 at this time. Bests, Dongjoon. From: Reynold Xin > Date: Thursday, April 27, 2017 at 1:06 PM To: Michael Armbrust >,

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-19 Thread Dong Joon Hyun
+1 I tested RC3 on CentOS 7.3.1611/OpenJDK 1.8.0_121/R 3.3.3 with `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver –Psparkr` At the end of R test, I saw `Had CRAN check errors; see logs.`, but tests passed and log file looks good. Bests, Dongjoon. From: Reynold Xin