Re: [DISCUSSION] Esoteric Spark function `TRIM/LTRIM/RTRIM`

2020-02-14 Thread Dongjoon Hyun
Please note that the context if TRIM/LTRIM/RTRIM with two-parameters and TRIM(trimStr FROM str) syntax. This thread is irrelevant to one-parameter TRIM/LTRIM/RTRIM. On Fri, Feb 14, 2020 at 11:35 AM Dongjoon Hyun wrote: > Hi, All. > > I'm sending this email because the Apache Spark c

[DISCUSSION] Esoteric Spark function `TRIM/LTRIM/RTRIM`

2020-02-14 Thread Dongjoon Hyun
Hi, All. I'm sending this email because the Apache Spark committers had better have a consistent point of views for the upcoming PRs. And, the community policy is the way to lead the community members transparently and clearly for a long term good. First of all, I want to emphasize that, like

Re: Request to document the direct relationship between other configurations

2020-02-12 Thread Dongjoon Hyun
Thank you for raising the issue, Hyukjin. According to the current status of discussion, it seems that we are able to agree on updating the non-structured configurations and keeping the structured configuration AS-IS. I'm +1 for the revisiting the configurations if that is our direction. If

Re: [DISCUSSION] Esoteric Spark function `TRIM/LTRIM/RTRIM`

2020-02-19 Thread Dongjoon Hyun
e best way to deprecate an SQL function. Runtime > warning can be annoying if it keeps coming out. Maybe we should only log > the warning once per Spark application. > > On Tue, Feb 18, 2020 at 3:45 PM Dongjoon Hyun > wrote: > >> Thank you for feedback, Wenchen, Maxim, and T

Re: Breaking API changes in Spark 3.0

2020-02-19 Thread Dongjoon Hyun
Hi, Karen. Are you saying that Spark 3 has to have all deprecated 2.x APIs? Could you tell us what is your criteria for `unnecessarily` or `necessarily`? > the migration process from Spark 2 to Spark 3 unnecessarily painful. Bests, Dongjoon. On Tue, Feb 18, 2020 at 4:55 PM Karen Feng wrote:

Re: Breaking API changes in Spark 3.0

2020-02-19 Thread Dongjoon Hyun
n 2.4 this >> way libraries and programs can dual target during the migration process. >> >> Now that isn’t always going to be doable, but certainly worth looking at >> the situations where we aren’t providing a smooth migration path and making >> sure it’s the best t

Re: [DISCUSS] naming policy of Spark configs

2020-02-12 Thread Dongjoon Hyun
Thank you, Wenchen. The new policy looks clear to me. +1 for the explicit policy. So, are we going to revise the existing conf names before 3.0.0 release? Or, is it applied to new up-coming configurations from now? Bests, Dongjoon. On Wed, Feb 12, 2020 at 7:43 AM Wenchen Fan wrote: > Hi

Re: Apache Spark Docker image repository

2020-02-10 Thread Dongjoon Hyun
too much, I think it's fine to give a shot. > > > 2020년 2월 8일 (토) 오전 6:51, Dongjoon Hyun 님이 작성: > >> Thank you, Sean, Jiaxin, Shane, and Tom, for feedbacks. >> >> 1. For legal questions, please see the following three Apache-approved >> approaches. We

Re: Apache Spark Docker image repository

2020-02-11 Thread Dongjoon Hyun
at official > releases > > 2) There was some ambiguity about whether or not a container image that > included GPL'ed packages (spark images do) might trip over the GPL "viral > propagation" due to integrating ASL and GPL in a "binary release". The > "air gap&

Re: [DISCUSS] Support year-month and day-time Intervals

2020-01-10 Thread Dongjoon Hyun
es, > 2. `interval` -> CalenderIntervalType support in the parser > > Thanks > > *Kent Yao* > Data Science Center, Hangzhou Research Institute, Netease Corp. > PHONE: (86) 186-5715-3499 > EMAIL: hzyao...@corp.netease.com > > On 01/11/2020 01:57,Dongjoon Hyun >

Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-15 Thread Dongjoon Hyun
t;>>>> PGP Key ID: 42E5B25A8F7A82C1 >>>>> >>>>> On Tue, Jan 14, 2020 at 11:08 AM Sean Owen wrote: >>>>> > >>>>> > Yeah it's something about the env I spun up, but I don't know what. >>>>> It >>

Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-16 Thread Dongjoon Hyun
out it fixed a > regression, long lasting one (broken at 2.3.0). The link refers the PR for > 2.4 branch. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > On Thu, Jan 16, 2020 at 12:56 PM Dongjoon Hyun > wrote: > >> Sure. Wenchen and Hyukjin. >> >> I observ

Re: Spark master build hangs using parallel build option in maven

2020-01-17 Thread Dongjoon Hyun
Hi, Saurabh. It seems that you are hitting https://issues.apache.org/jira/browse/SPARK-26095 . And, we disabled the parallel build via https://github.com/apache/spark/pull/23061 at 3.0.0. According to the stack trace in JIRA and PR description, `maven-shade-plugin` seems to be the root cause.

Correctness and data loss issues

2020-01-19 Thread Dongjoon Hyun
Hi, All. According to our policy, "Correctness and data loss issues should be considered Blockers". - http://spark.apache.org/contributing.html Since we are close to branch-3.0 cut, I want to ask your opinions on the following correctness and data loss issues. SPARK-30218 Columns used

[VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-13 Thread Dongjoon Hyun
Please vote on releasing the following candidate as Apache Spark version 2.4.5. The vote is open until January 16th 5AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.4.5 [ ] -1 Do not release this package because

Re: Correctness and data loss issues

2020-01-21 Thread Dongjoon Hyun
://issues.apache.org/jira/browse/SPARK-28344 > > On Mon, Jan 20, 2020 at 2:07 PM Dongjoon Hyun > wrote: > >> Hi, All. >> >> According to our policy, "Correctness and data loss issues should be >> considered Blockers". >> >> - http://

Re: Correctness and data loss issues

2020-01-22 Thread Dongjoon Hyun
ut that it feels like > it can wait for 3.0 but would be good to get others input and I'm not an > expert on SQL standard and what do the other sql engines do in this case. > > Tom > > On Monday, January 20, 2020, 12:07:54 AM CST, Dongjoon Hyun < > dongjoon.h...@gmail.c

Re: Correctness and data loss issues

2020-01-22 Thread Dongjoon Hyun
. The remaining things are the followings: 1. Revisit `3.0.0`-only correctness patches? 2. Set the target version to `2.4.5`? (Specifically, is this feasible in terms of timeline?) Bests, Dongjoon. On Wed, Jan 22, 2020 at 9:43 AM Dongjoon Hyun wrote: > Hi, Tom. > > Th

[FYI] SBT Build Failure

2020-01-16 Thread Dongjoon Hyun
Hi, All. As of now, Apache Spark sbt build is broken by the Maven Central repository policy. - https://stackoverflow.com/questions/59764749/requests-to-http-repo1-maven-org-maven2-return-a-501-https-required-status-an > Effective January 15, 2020, The Central Maven Repository no longer supports

Re: PR lint-scala jobs failing with http error

2020-01-16 Thread Dongjoon Hyun
Hi, Tom and Shane. It looks like an old `sbt` bug. Maven seems to start to ban the `http` access recently. If you use Maven, it's okay because it goes to `https`. $ build/sbt clean [error] org.apache.maven.model.building.ModelBuildingException: 1 problem was encountered while building the

Spark 2.4.5 RC2 Preparation Status

2020-01-20 Thread Dongjoon Hyun
Hi, All. RC2 was scheduled on Today and all RC1 feedbacks seems to be addressed. However, I'm waiting for another on-going correctness PR. https://github.com/apache/spark/pull/27233 [SPARK-29701][SQL] Correct behaviours of group analytical queries when empty input given Unlike the other

Re: Adding Maven Central mirror from Google to the build?

2020-01-21 Thread Dongjoon Hyun
+1, I'm supporting the following proposal. > this mirror as the primary repo in the build, falling back to Central if needed. Thanks, Dongjoon. On Tue, Jan 21, 2020 at 14:37 Sean Owen wrote: > See https://github.com/apache/spark/pull/27307 for some context. We've > had to add, in at least

Re: [DISCUSS] Support year-month and day-time Intervals

2020-01-10 Thread Dongjoon Hyun
Hi, Kent. Thank you for the proposal. Does your proposal need to revert something from the master branch? I'm just asking because it's not clear in the proposal document. Bests, Dongjoon. On Fri, Jan 10, 2020 at 5:31 AM Dr. Kent Yao wrote: > Hi, Devs > > I’d like to propose to add two new

Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-13 Thread Dongjoon Hyun
Server Version: v1.14.9-eks-c0eccc Bests, Dongjoon. On Mon, Jan 13, 2020 at 4:27 AM Dongjoon Hyun wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.4.5. > > The vote is open until January 16th 5AM PST and passes if a majority +1 > PMC

Re: Spark 3.0 branch cut and code freeze on Jan 31?

2019-12-24 Thread Dongjoon Hyun
+1 for January 31st. Bests, Dongjoon. On Tue, Dec 24, 2019 at 7:11 AM Xiao Li wrote: > Jan 31 is pretty reasonable. Happy Holidays! > > Xiao > > On Tue, Dec 24, 2019 at 5:52 AM Sean Owen wrote: > >> Yep, always happens. Is earlier realistic, like Jan 15? it's all >> arbitrary but indeed this

Re: [ANNOUNCE] Announcing Apache Spark 3.0.0-preview2

2019-12-24 Thread Dongjoon Hyun
Indeed! Thank you again, Yuming and all. Bests, Dongjoon. On Tue, Dec 24, 2019 at 13:38 Takeshi Yamamuro wrote: > Great work, Yuming! > > Bests, > Takeshi > > On Wed, Dec 25, 2019 at 6:00 AM Xiao Li wrote: > >> Thank you all. Happy Holidays! >> >> Xiao >> >> On Tue, Dec 24, 2019 at 12:53 PM

Release Apache Spark 2.4.5

2020-01-05 Thread Dongjoon Hyun
Hi, All. Happy New Year (2020)! Although we slightly missed the timeline for 3.0 branch cut last month, it seems that we keep 2.4.x timeline on track. https://spark.apache.org/versioning-policy.html As of today, `branch-2.4` has 154 patches since v2.4.4. $ git log --oneline

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-10 Thread Dongjoon Hyun
t; >>>>>>> +1 (non-binding) >>>>>>> >>>>>>> Bests, >>>>>>> Takeshi >>>>>>> >>>>>>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang < >>>>>>> gengliang.w...@dat

Re: [Proposal] Modification to Spark's Semantic Versioning Policy

2020-03-06 Thread Dongjoon Hyun
[SPARK-24640][SQL] Return `NULL` from `size(NULL)` by default Bests, Dongjoon. On Thu, Mar 5, 2020 at 9:08 PM Dongjoon Hyun wrote: > Hi, All. > > There is a on-going Xiao's PR referencing this email. > > https://github.com/apache/spark/pull/27821 > > Bests, > Dongjoon. >

Re: [Proposal] Modification to Spark's Semantic Versioning Policy

2020-03-08 Thread Dongjoon Hyun
repo and have tons of mails. Compared to the popularity on Github PRs, >>> dev@ mailing list is not that crowded so less chance of missing the >>> critical changes, and not quickly decided by only a couple of committers. >>> >>> These suggestions would slow

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-07 Thread Dongjoon Hyun
This new policy has a good indention, but can we narrow down on the migration from Apache Spark 2.4.5 to Apache Spark 3.0+? I saw that there already exists a reverting PR to bring back Spark 1.4 and 1.5 APIs based on this AS-IS suggestion. The AS-IS policy is clearly mentioning that

Re: [Proposal] Modification to Spark's Semantic Versioning Policy

2020-03-07 Thread Dongjoon Hyun
ments turn > around 'commonly used' but can we know that more concretely? > > Otherwise I think we'll back into implementing personal interpretations of > general principles, which is arguably the issue in the first place, even > when everyone believes in good faith in the same princip

FYI: The evolution on `CHAR` type behavior

2020-03-14 Thread Dongjoon Hyun
Hi, All. Apache Spark has been suffered from a known consistency issue on `CHAR` type behavior among its usages and configurations. However, the evolution direction has been gradually moving forward to be consistent inside Apache Spark because we don't have `CHAR` offically. The following is the

Re: Auto-linking from PRs to Jira tickets

2020-03-11 Thread Dongjoon Hyun
Thank you, Alex, Nicholas, and Holden. I filed an INFRA issue for Apache Spark like Zeppelin. https://issues.apache.org/jira/browse/INFRA-19957 Bests, Dongjoon. On Tue, Mar 10, 2020 at 12:03 PM Alex Ott wrote: > yes - it's https://issues.apache.org/jira/browse/INFRA-19934 > > Nicholas

Re: Auto-linking from PRs to Jira tickets

2020-03-11 Thread Dongjoon Hyun
Hi, All. Autolinking from PR to JIRA started. *Inside PR* https://github.com/apache/spark/pull/27881 *Inside commit log* https://github.com/apache/spark/commits/master You don't need to add hyperlink to `SPARK-XXX` manually from now. Bests, Dongjoon. >

Re: FYI: The evolution on `CHAR` type behavior

2020-03-15 Thread Dongjoon Hyun
code that was working for char(3) would now stop > working. > > For new users, depending on whether the underlying metastore char(3) is > either supported but different from ansi Sql (which is not that big of a > deal if we explain it) or not supported. > > On Sat, Mar 14, 2020 at 3

Is `branch-3.0` frozen for RC1 or not?

2020-03-31 Thread Dongjoon Hyun
Hi, All. RC1 tag was created yesterday and traditionally we hold on all backporting activities to give some time to a release manager. I'm also holding two commits at master branch. https://github.com/apache/spark/tree/v3.0.0-rc1 However, I'm still seeing some commits land on `branch-3.0`.

Release Manager's official `branch-3.0` Assessment?

2020-03-24 Thread Dongjoon Hyun
Hi, All. First of all, always "Community Over Code"! I wish you the best health and happiness. As we know, we are still working on QA period, we didn't reach RC stage. It seems that we need to make website up-to-date once more. https://spark.apache.org/versioning-policy.html If possible,

Re: [build system] jenkins rebooting now

2020-05-14 Thread Dongjoon Hyun
Thank you so much, Shane! On Thu, May 14, 2020 at 9:51 AM Xiao Li wrote: > Thank you, Shane! > > On Thu, May 14, 2020 at 9:50 AM shane knapp ☠ wrote: > >> we're back. doesn't seem to have fixed the issue of the workers >> connecting to repository.apache.org but i'm still investigating. >> >>

Re: [VOTE] Release Spark 2.4.6 (RC1)

2020-05-08 Thread Dongjoon Hyun
I confirmed and update the JIRA. SPARK-31663 is a correctness issue since Apache Spark 2.4.0. Bests, Dongjoon. On Fri, May 8, 2020 at 10:26 AM Holden Karau wrote: > Can you provide a bit more context (is it a regression?) > > On Fri, May 8, 2020 at 9:33 AM Yuanjian Li wrote: > >> Hi Holden,

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Dongjoon Hyun
nsistently everywhere. > > > Cheers, > > Steve C > > On 17 Mar 2020, at 10:01 am, Dongjoon Hyun > wrote: > > Hi, Reynold. > (And +Michael Armbrust) > > If you think so, do you think it's okay that we change the return value > silently? Then, I'm wondering why we r

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Dongjoon Hyun
PM, Reynold Xin wrote: > >> I looked up our usage logs (sorry I can't share this publicly) and trim >> has at least four orders of magnitude higher usage than char. >> >> >> On Mon, Mar 16, 2020 at 5:27 PM, Dongjoon Hyun >> wrote: >> >>> T

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Dongjoon Hyun
(honestly negligible). I was comparing select vs > select. > > > > On Mon, Mar 16, 2020 at 5:40 PM, Dongjoon Hyun > wrote: > >> Ur, are you comparing the number of SELECT statement with TRIM and CREATE >> statements with `CHAR`? >> >> > I looked up our usage

Re: FYI: The evolution on `CHAR` type behavior

2020-03-19 Thread Dongjoon Hyun
+1 for Wenchen's suggestion. I believe that the difference and effects are informed widely and discussed in many ways twice. First, this was shared on last December. "FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax", 2019/12/06

Re: FYI: The evolution on `CHAR` type behavior

2020-03-19 Thread Dongjoon Hyun
hread.html/493f88c10169680191791f9f6962fd16cd0ffa3b06726e92ed04cbe1%40%3Cdev.spark.apache.org%3E > > (Yes it talked about changing the default data source provider, but that's > just one of the ways we are exposing this char/varchar issue). > > > > On Thu, Mar 19, 2020 at 8:41 PM, Dongjoon Hyun

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Dongjoon Hyun
0the%20default.=Snowflake%20currently%20deviates%20from%20common,space%2Dpadded%20at%20the%20end.> >> : >> "Snowflake currently deviates from common CHAR semantics in that strings >> shorter than the maximum length are not space-padded at the end." >> >> MyS

Re: Release Manager's official `branch-3.0` Assessment?

2020-03-24 Thread Dongjoon Hyun
+1 Thanks, Dongjoon. On Tue, Mar 24, 2020 at 14:49 Reynold Xin wrote: > I actually think we should start cutting RCs. We can cut RCs even with > blockers. > > > On Tue, Mar 24, 2020 at 12:51 PM, Dongjoon Hyun > wrote: > >> Hi, All. >> >> First of all,

Re: [VOTE] Release Spark 2.4.6 (RC1)

2020-05-07 Thread Dongjoon Hyun
Hi, Holden. The following link looks outdated. It was a link used at Spark 2.4.5 RC2. - https://repository.apache.org/content/repositories/orgapachespark-1340/ Instead, in the Apache repo, there are three candidates. Is 1343 the one we vote? -

Re: [VOTE] Release Spark 3.0.1 (RC3)

2020-09-01 Thread Dongjoon Hyun
+1. Thank you all. I tested the following additionally with OpenJDK 11.0.8. - PySpark UT on Python 3.7.7 with Pandas 0.23.2 / PyArrow 0.15.1. - JDBC integration suite - K8s integration suite (except SparkR test) (Minikube: K8s Client v1.18.8, K8s Server v1.17.11) For SparkR,

Re: [VOTE] Release Spark 3.0.1 (RC3)

2020-09-02 Thread Dongjoon Hyun
; So, I'm wondering if Spark 3.0.1 supports R 4.0 without any issue. > > I believe we now test SparkR at branch-3.0 with R 4.0 after > https://github.com/apache/spark/commit/56ec5ddcac8233011c17fc7d120a284707f0f712 > > > 2020년 9월 2일 (수) 오후 12:47, Dongjoon Hyun 님이 작성: &g

Re: [ANNOUNCE] Announcing Apache Spark 3.0.1

2020-09-11 Thread Dongjoon Hyun
It's great. Thank you, Ruifeng! Bests, Dongjoon. On Fri, Sep 11, 2020 at 1:54 AM 郑瑞峰 wrote: > Hi all, > > We are happy to announce the availability of Spark 3.0.1! > Spark 3.0.1 is a maintenance release containing stability fixes. This > release is based on the branch-3.0 maintenance branch of

Re: [VOTE][RESULT] Release Spark 2.4.7 (RC3)

2020-09-11 Thread Dongjoon Hyun
Thank you, Prashant! Bests, Dongjoon. On Fri, Sep 11, 2020 at 7:02 PM Prashant Sharma wrote: > The vote passes. Thanks to all who helped with the release! > > (* = binding) > +1: > - Sean Owen * > - Wenchan Fan * > - Dongjoon Hyun * > - Mridul * > > +0: None > > -1: None > > > >

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

2020-09-15 Thread Dongjoon Hyun
+1 Bests, Dongjoon. On Mon, Sep 14, 2020 at 9:19 PM kalyan wrote: > +1 > > Will positively improve the performance and reliability of spark... > Looking fwd to this.. > > Regards > Kalyan. > > On Tue, Sep 15, 2020, 9:26 AM Joseph Torres > wrote: > >> +1 >> >> On Mon, Sep 14, 2020 at 6:39 PM

Re: Spark 3.1 first RC date

2020-10-02 Thread Dongjoon Hyun
Hi, Igor. The first RC is scheduled for early December . Please see the website for Apache Spark release cadence. - https://spark.apache.org/versioning-policy.html Date Event Early Nov 2020 Code freeze. Release branch cut. Mid Nov 2020QA period. Focus on bug fixes, tests, stability and

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-04 Thread Dongjoon Hyun
4, 2020 at 10:53 AM Dongjoon Hyun wrote: > Thank you all. > > BTW, Xiao and Mridul, I'm wondering what date you have in your mind > specifically. > > Usually, `Christmas and New Year season` doesn't give us much additional > time. > > If you think so, could you make a

Re: [FYI] Kubernetes GA Preparation (SPARK-33005)

2020-10-04 Thread Dongjoon Hyun
browse/SPARK-31800 > > Note that the tittle shouldn't be "*Unable to disable Kerberos when > submitting jobs to Kubernetes" *(based on the comments) and something > more related with the spark.kubernetes.file.upload.path property > > Should we add it too? > > On Wed

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-04 Thread Dongjoon Hyun
le syntax: >>https://issues.apache.org/jira/browse/SPARK-31257 >>- Bloom filter join: https://issues.apache.org/jira/browse/SPARK-32268 >> >> Thanks, >> >> Xiao >> >> >> Hyukjin Kwon 于2020年10月3日周六 下午5:41写道: >> >>> Nice summa

[UPDATE] Apache Spark 3.1.0 Release Window

2020-10-12 Thread Dongjoon Hyun
Hi, All. Apache Spark 3.1.0 Release Window is adjusted like the following today. Please check the latest information on the official website. - https://github.com/apache/spark-website/commit/0cd0bdc80503882b4737db7e77cc8f9d17ec12ca - https://spark.apache.org/versioning-policy.html

Re: Scala 3 support approach

2020-10-19 Thread Dongjoon Hyun
Hi, Koert. We know, welcome, and believe it. However, it's only Scala community's roadmap so far. It doesn't mean Apache Spark supports Scala 3 officially. For example, Apache Spark 3.0.1 supports Scala 2.12.10 but not 2.12.12 due to Scala issue. In Apache Spark community, we had better focus

Re: Scala 3 support approach

2020-10-18 Thread Dongjoon Hyun
Hi, Denis We are currently moving toward Scala 3 together by focusing on completion SPARK-25075 first as a stepping stone. https://issues.apache.org/jira/browse/SPARK-25075 Build and test Spark against Scala 2.13 We didn't finish it yet. We need to have Jenkins jobs with Scala 2.13.

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-07 Thread Dongjoon Hyun
me (using older spark version to extract > out of hive, then switch to newer spark version) so i am not too worried > about this. just making sure i understand. > > thanks > > On Sat, Oct 3, 2020 at 8:17 PM Dongjoon Hyun > wrote: > >> Hi, All. >> >> As of today,

Re: [FYI] Removing `spark-3.1.0-bin-hadoop2.7-hive1.2.tgz` from Apache Spark 3.1 distribution

2020-10-07 Thread Dongjoon Hyun
tle confused about this. i assumed spark would no longer make a > distribution with hive 1.x, but the hive-1.2 profile remains. > > yet i see the hive-1.2 profile has been removed from pom.xml? > > On Wed, Sep 23, 2020 at 6:58 PM Dongjoon Hyun > wrote: > >> Hi, All.

Re: 回复: [DISCUSS] Apache Spark 3.0.1 Release

2020-08-25 Thread Dongjoon Hyun
nresolved bugs raised > against 3.0.0, but conversely there were quite a few critical correctness > fixes waiting to be released. > > > > Cheers, > > Jason. > > > > *From: *Takeshi Yamamuro > *Date: *Wednesday, 15 July 2020 at 9:00 am > *To: *Shivaram Venkataraman &g

Re: [FYI] Kubernetes GA Preparation (SPARK-33005)

2020-09-29 Thread Dongjoon Hyun
Thank you! Bests, Dongjoon On Mon, Sep 28, 2020 at 8:07 PM Dr. Kent Yao wrote: > Thanks, Dongjon, > >I pined two long-standing issues to the umbrella. > > > >https://issues.apache.org/jira/browse/SPARK-28895 > >https://issues.apache.org/jira/browse/SPARK-28992 > > > >This helps

Re: A common naming policy for third-party packages/modules under org.apache.spark?

2020-09-21 Thread Dongjoon Hyun
Hi, Steve. Sure, you can suggest, but I'm wondering how the suggested namespaces are able to satisfy the existing visibility rules. Could you give us some examples specifically? > Can I suggest some common prefix for third-party-classes put into the spark package tree, just to make clear that

Re: A common naming policy for third-party packages/modules under org.apache.spark?

2020-09-21 Thread Dongjoon Hyun
Hi, Steve. Sure, you can suggest, but I'm wondering how the suggested namespaces are able to satisfy the existing visibility rules. Could you give us some examples specifically? > Can I suggest some common prefix for third-party-classes put into the spark package tree, just to make clear that

[FYI] Removing `spark-3.1.0-bin-hadoop2.7-hive1.2.tgz` from Apache Spark 3.1 distribution

2020-09-23 Thread Dongjoon Hyun
Hi, All. Since Apache Spark 3.0.0, Apache Hive 2.3.7 is the default Hive execution library. The forked Hive 1.2.1 library is not recommended because it's not maintained properly. In Apache Spark 3.1 on December 2020, we are going to remove it from our official distribution.

[FYI] Kubernetes GA Preparation (SPARK-33005)

2020-09-28 Thread Dongjoon Hyun
Hi, All. K8s GA preparation is on the way like the following. https://issues.apache.org/jira/browse/SPARK-33005 Apache Spark 3.1/3.2 is scheduled for December 2020 and mid of 2021 (TBD). If you hit K8s issues, please file a JIRA issue. To give more visibility to your issue, you can create

Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-03 Thread Dongjoon Hyun
Hi, All. As of today, master branch (Apache Spark 3.1.0) resolved 852+ JIRA issues and 606+ issues are 3.1.0-only patches. According to the 3.1.0 release window, branch-3.1 will be created on November 1st and enters QA period. Here are some notable updates I've been monitoring. *Language* 01.

Re: [VOTE] Release Spark 2.4.7 (RC1)

2020-08-08 Thread Dongjoon Hyun
Hi, All. Unfortunately, there is an on-going discussion about the new decimal correctness. Although we fixed one correctness issue at master and backported it partially to 3.0/2.4, it turns out that it needs more patched to be complete. Please see https://github.com/apache/spark/pull/29125 for

Re: [VOTE] Release Spark 2.4.7 (RC1)

2020-08-08 Thread Dongjoon Hyun
the priority of SPARK-31703 to `Blocker` for both Apache Spark 2.4.7 and 3.0.1. Bests, Dongjoon. On Sat, Aug 8, 2020 at 6:10 AM Holden Karau wrote: > I'm going to go ahead and vote -0 then based on that then. > > On Fri, Aug 7, 2020 at 11:36 PM Dongjoon Hyun > wrote:

Re: [VOTE] Decommissioning SPIP

2020-07-02 Thread Dongjoon Hyun
+1. Thank you, Holden. Bests, Dongjoon. On Thu, Jul 2, 2020 at 6:43 AM wuyi wrote: > +1 for having this feature in Spark > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > - > To

Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

2020-07-02 Thread Dongjoon Hyun
Thank you, Hyukjin. According to the Python community, Python 3.5 is also EOF at 2020-09-13 (only two months left). - https://www.python.org/downloads/ So, targeting live Python versions at Apache Spark 3.1.0 (December 2020) looks reasonable to me. For old Python versions, we still have Apache

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-07-05 Thread Dongjoon Hyun
GA > > -- > *From:* Holden Karau > *Sent:* Monday, June 29, 2020 9:33 AM > *To:* Maxim Gekk > *Cc:* Dongjoon Hyun; dev > *Subject:* Re: Apache Spark 3.1 Feature Expectation (Dec. 2020) > > Should we also consider the shuffle service refactoring

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread Dongjoon Hyun
Thank you always, Shane! Bests, Dongjoon. On Thu, Jul 9, 2020 at 9:30 AM shane knapp ☠ wrote: > this is happening now. > > On Wed, Jul 8, 2020 at 9:07 AM shane knapp ☠ wrote: > >> this will be happening tomorrow... today is Meeting Hell Day[tm]. >> >> On Tue, Jul 7, 2020 at 1:59 PM shane

Re: Welcoming some new Apache Spark committers

2020-07-14 Thread Dongjoon Hyun
Welcome everyone! :D Bests, Dongjoon. On Tue, Jul 14, 2020 at 11:21 AM Xiao Li wrote: > Welcome, Dilip, Huaxin and Jungtaek! > > Xiao > > On Tue, Jul 14, 2020 at 11:02 AM Holden Karau > wrote: > >> So excited to have our committer pool growing with these awesome folks, >> welcome y'all! >> >>

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-07-14 Thread Dongjoon Hyun
Hi, Yi. Could you explain why you think that is a blocker? For the given example from the JIRA description, spark.udf.register("key", udf((m: Map[String, String]) => m.keys.head.toInt)) Seq(Map("1" -> "one", "2" -> "two")).toDF("a").createOrReplaceTempView("t") checkAnswer(sql("SELECT key(a)

Re: [Spark Core] Merging PR #23340 for New Executor Memory Metrics

2020-06-30 Thread Dongjoon Hyun
HI, Alex and Michel. I removed the `Stale` label and reopened it for now. You may want to ping the original author because the last update of that PR is one year ago and has many conflicts as of today. Bests, Dongjoon. On Tue, Jun 30, 2020 at 10:56 AM Alex Scammon <

Re: Jenkins is down

2020-07-05 Thread Dongjoon Hyun
Hi, All. Now, AmpLab Jenkins farm came back online. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/ Also, many PRBuilder jobs were re-started 10 minutes ago. Bests, Dongjoon. On Fri, Jul 3, 2020 at 4:43 AM Hyukjin Kwon wrote: > Hi all and Shane, > > Is there

Re: [vote] Apache Spark 3.0 RC3

2020-06-15 Thread Dongjoon Hyun
alizing. > > PS: There are two critical problems I've seen with the release (Spark UI > is virtually unusable in some cases, and streaming issues). I will > highlight them in the release notes and link to the JIRA tickets. But I > think we should make 3.0.1 ASAP to follow up. > >

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2020-06-23 Thread Dongjoon Hyun
of Hive >>> thriftserver. >>> >>> To reduce the risk, I would like to keep the current default version >>> unchanged. When it becomes stable, we can change the default profile to >>> Hadoop-3.2. >>> >>> Cheers, >>> >>> X

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2020-06-24 Thread Dongjoon Hyun
Thanks, Xiao, Sean, Nicholas. To Xiao, > it sounds like Hadoop 3.x is not as popular as Hadoop 2.7. If you say so, - Apache Hadoop 2.6.0 is the most popular one with 156 dependencies. - Apache Spark 2.2.0 is the most popular one with 264 dependencies. As we know, it doesn't make sense. Are we

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2020-06-24 Thread Dongjoon Hyun
To Xiao. Why Apache project releases should be blocked by PyPi / CRAN? It's completely optional, isn't it? > let me repeat my opinion: the top priority is to provide two options for PyPi distribution IIRC, Apache Spark 3.0.0 fails to upload to CRAN and this is not the first incident. Apache

Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-29 Thread Dongjoon Hyun
Hi, All. After a short celebration of Apache Spark 3.0, I'd like to ask you the community opinion on Apache Spark 3.1 feature expectations. First of all, Apache Spark 3.1 is scheduled for December 2020. - https://spark.apache.org/versioning-policy.html I'm expecting the following items: 1.

Re: [vote] Apache Spark 3.0 RC3

2020-06-14 Thread Dongjoon Hyun
ia (binding) > Jungtaek Lim > Denny Lee > Russell Spitzer > Dongjoon Hyun (binding) > DB Tsai (binding) > Michael Armbrust (binding) > Tom Graves (binding) > Bryan Cutler > Huaxin Gao > Jiaxin Shan > Xingbo Jiang > Xiao Li (binding) > Hyukjin Kwon (binding) > Kent

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2020-06-23 Thread Dongjoon Hyun
as the default? > > How to explain this to the community? I would not change the default for > consistency. > > Xiao > > > > On Tue, Jun 23, 2020 at 7:18 PM Dongjoon Hyun > wrote: > >> Thanks. Uploading PySpark to PyPI is a simple manual step and our rele

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2020-06-23 Thread Dongjoon Hyun
; Please correct me if my concern is not valid. > > Xiao > > > On Tue, Jun 23, 2020 at 12:04 AM Dongjoon Hyun > wrote: > >> Hi, All. >> >> I bump up this thread again with the title "Use Hadoop-3.2 as a default >> Hadoop profile in 3.1.0?" >>

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-23 Thread Dongjoon Hyun
+1 Bests, Dongjoon. On Tue, Jun 23, 2020 at 1:19 PM Jungtaek Lim wrote: > +1 on a 3.0.1 soon. > > Probably it would be nice if some Scala experts can take a look at > https://issues.apache.org/jira/browse/SPARK-32051 and include the fix > into 3.0.1 if possible. > Looks like APIs designed to

Re: [vote] Apache Spark 3.0 RC3

2020-06-08 Thread Dongjoon Hyun
+1 Thanks, Dongjoon. On Mon, Jun 8, 2020 at 6:37 AM Russell Spitzer wrote: > +1 (non-binding) ran the new SCC DSV2 suite and all other tests, no issues > > On Sun, Jun 7, 2020 at 11:12 PM Yin Huai wrote: > >> Hello everyone, >> >> I am wondering if it makes more sense to not count Saturday

Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-06-03 Thread Dongjoon Hyun
+1 Bests, Dongjoon On Wed, Jun 3, 2020 at 5:59 AM Tom Graves wrote: > +1 > > Tom > > On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau < > hol...@pigscanfly.ca> wrote: > > > Please vote on releasing the following candidate as Apache Spark > version 2.4.6. > > The vote is open until June

Re: [ANNOUNCE] Apache Spark 2.4.6 released

2020-06-10 Thread Dongjoon Hyun
Thank you so much, Holden! :) On Wed, Jun 10, 2020 at 6:59 PM Hyukjin Kwon wrote: > Yay! > > 2020년 6월 11일 (목) 오전 10:38, Holden Karau 님이 작성: > >> We are happy to announce the availability of Spark 2.4.6! >> >> Spark 2.4.6 is a maintenance release containing stability, correctness, >> and

Re: Starting work on last Scala 2.13 updates

2020-07-24 Thread Dongjoon Hyun
Thank you so much, Sean! Bests, Dongjoon. On Fri, Jul 24, 2020 at 8:56 AM Sean Owen wrote: > Status update - we should have Scala 2.13 compiling, with the > exception of the REPL. > Looks like 99% or so of tests pass too, but the remaining ones might > be hard to debug. I haven't looked hard

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-11 Thread Dongjoon Hyun
Thank you, Hyukjin! Bests, Dongjoon. On Mon, Jan 11, 2021 at 7:24 AM Hyukjin Kwon wrote: > I had a response from the INFRA team and Sonatype. Just to share, the > removal is possible as an exception, but it's best to go ahead for 3.1.1 > for safety as we all discussed. > There are several

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-06 Thread Dongjoon Hyun
Thank you, Jacek, Sean, and Hyukjin. The release is a human-driven process. Everyone can make mistakes. For example, I released Apache Spark 2.2.3 with a missing pandoc, but we didn't touch it because it's a community-blessed official version. https://pypi.org/project/pyspark/2.2.3/ For

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-06 Thread Dongjoon Hyun
Before we discover the pre-uploaded artifacts, both Jungtaek and Hyukjin already made two blockers shared here. IIUC, it meant implicitly RC1 failure at that time. In addition to that, there are two correctness issues. So, I made up my mind to cast -1 for this RC1 before joining this thread.

Re: [build system] WE'RE LIVE!

2020-12-01 Thread Dongjoon Hyun
Yay! Thanks! Bests, Dongjoon On Tue, Dec 1, 2020 at 5:31 PM Takeshi Yamamuro wrote: > Many thanks, guys! > I've checked I can re-trigger Jenkins tests. > > Bests, > Takeshi > > On Wed, Dec 2, 2020 at 9:55 AM shane knapp ☠ wrote: > >> https://amplab.cs.berkeley.edu/jenkins/ >> >> i cleared the

Re: Apache ORC 1.6.6 Release

2020-12-03 Thread Dongjoon Hyun
etStripeStatistics back for backward compatibility > ORC-669. Reduce breaking changes in ReaderImpl.java > > As of today, the snapshot release passed Apache Spark and Apache Iceberg > UTs. > > https://github.com/dongjoon-hyun/spark/pull/41 > https://github.com/dongjoon-hyun/iceberg/pull/1

Apache ORC 1.6.6 Release

2020-12-03 Thread Dongjoon Hyun
changes in ReaderImpl.java As of today, the snapshot release passed Apache Spark and Apache Iceberg UTs. https://github.com/dongjoon-hyun/spark/pull/41 https://github.com/dongjoon-hyun/iceberg/pull/1 I start to roll 1.6.6-rc0. After 1.6.6 release, 1.6.7 will focus on Apache Hive. Thanks, Dongjoon.

Re: Spark branch-3.1

2020-12-04 Thread Dongjoon Hyun
Thank you so much, Hyukjin Kwon. I made a PR for updating the `master` branch to 3.2.0-SNAPSHOT. https://github.com/apache/spark/pull/30606 [SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT Bests, Dongjoon. On Fri, Dec 4, 2020 at 7:05 AM Tom Graves wrote: > Can we update the

Re: [build system] jenkins downtime today/tomorrow

2020-11-30 Thread Dongjoon Hyun
Thank you, Shane. :) Bests, Dongjoon. On Mon, Nov 30, 2020 at 10:05 AM shane knapp ☠ wrote: > hey all! > > the Great Jenkins Migration[tm] is well under way, and we will be > sunsetting the old amp-jenkins-master server and moving to a new one. > > i've put jenkins in to quiet mode so that it

<    1   2   3   4   5   6   7   8   >