[Apache Spark][Streaming Job][Checkpoint]Spark job failed on Checkpoint recovery with Batch not found error

2020-05-28 Thread taylorwu
-7edb28361c5d] terminated with error java.lang.IllegalStateException: batch 946 doesn't exist at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply$mcV$sp

Re: [apache-spark]-spark-shuffle

2020-05-24 Thread vijay.bvp
explanation of how shuffle works https://stackoverflow.com/questions/37528047/how-are-stages-split-into-tasks-in-spark A sample of code and job configuration, the DAG underlying source (HDFS or others) would help explain thanks VP -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com

[apache-spark]-spark-shuffle

2020-05-22 Thread Vijay Kumar
Hi, I am trying to thoroughly understand below concepts in spark. 1. A job is reading 2 files and performing a cartesian join. 2. Sizes of input are 55.7 mb and 67.1 mb 3. after reading input file, spark did shuffle, for both the inputs shuffle was in KB. I want to understand why this size is

Re: Cross Region Apache Spark Setup

2020-04-23 Thread Stone Zhong
t. > > Just my 2 cents > > --- > Cheers, > -z > > > From: Stone Zhong > Sent: Wednesday, April 15, 2020 4:31 > To: user@spark.apache.org > Subject: Cross Region Apache Spark Setup > > Hi, > > I am trying to setup a cross regi

Re: Cross Region Apache Spark Setup

2020-04-20 Thread ZHANG Wei
--- Cheers, -z From: Stone Zhong Sent: Wednesday, April 15, 2020 4:31 To: user@spark.apache.org Subject: Cross Region Apache Spark Setup Hi, I am trying to setup a cross region Apache Spark cluster. All my data are stored in Amazon S3 and well partitioned

Cross Region Apache Spark Setup

2020-04-14 Thread Stone Zhong
Hi, I am trying to setup a cross region Apache Spark cluster. All my data are stored in Amazon S3 and well partitioned by region. For example, I have parquet file at S3://mybucket/sales_fact.parquet/us-west S3://mybucket/sales_fact.parquet/us-east S3://mybucket/sales_fact.parquet/uk

Re: Apache Arrow support for Apache Spark

2020-02-17 Thread Chris Teoh
1. I'd also consider how you're structuring the data before applying the join, naively doing the join could be expensive so doing a bit of data preparation may be necessary to improve join performance. Try to get a baseline as well. Arrow would help improve it. 2. Try storing it back as Parquet

Apache Arrow support for Apache Spark

2020-02-16 Thread Subash Prabakar
Hi Team, I have two questions regarding Arrow and Spark integration, 1. I am joining two huge tables (1PB) each - will the performance be huge when I use Arrow format before shuffling ? Will the serialization/deserialization cost have significant improvement? 2. Can we store the final data in

Re: [ANNOUNCE] Announcing Apache Spark 2.4.5

2020-02-10 Thread Jungtaek Lim
Nice work, Dongjoon! Thanks for the huge efforts on sorting out with correctness things as well. On Tue, Feb 11, 2020 at 12:40 PM Wenchen Fan wrote: > Great Job, Dongjoon! > > On Mon, Feb 10, 2020 at 4:18 PM Hyukjin Kwon wrote: > >> Thanks Dongjoon! >> >> 2020년 2월 9일 (일) 오전 10:49, Takeshi

Re: [ANNOUNCE] Announcing Apache Spark 2.4.5

2020-02-10 Thread Wenchen Fan
Great Job, Dongjoon! On Mon, Feb 10, 2020 at 4:18 PM Hyukjin Kwon wrote: > Thanks Dongjoon! > > 2020년 2월 9일 (일) 오전 10:49, Takeshi Yamamuro 님이 작성: > >> Happy to hear the release news! >> >> Bests, >> Takeshi >> >> On Sun, Feb 9, 2020 at 10:28 AM Dongjoon Hyun >> wrote: >> >>> There was a typo

Re: [ANNOUNCE] Announcing Apache Spark 2.4.5

2020-02-10 Thread Hyukjin Kwon
Thanks Dongjoon! 2020년 2월 9일 (일) 오전 10:49, Takeshi Yamamuro 님이 작성: > Happy to hear the release news! > > Bests, > Takeshi > > On Sun, Feb 9, 2020 at 10:28 AM Dongjoon Hyun > wrote: > >> There was a typo in one URL. The correct release note URL is here. >> >>

Re: [ANNOUNCE] Announcing Apache Spark 2.4.5

2020-02-08 Thread Takeshi Yamamuro
Happy to hear the release news! Bests, Takeshi On Sun, Feb 9, 2020 at 10:28 AM Dongjoon Hyun wrote: > There was a typo in one URL. The correct release note URL is here. > > https://spark.apache.org/releases/spark-release-2-4-5.html > > > > On Sat, Feb 8, 2020 at 5:22 PM Dongjoon Hyun > wrote:

Re: [ANNOUNCE] Announcing Apache Spark 2.4.5

2020-02-08 Thread Dongjoon Hyun
There was a typo in one URL. The correct release note URL is here. https://spark.apache.org/releases/spark-release-2-4-5.html On Sat, Feb 8, 2020 at 5:22 PM Dongjoon Hyun wrote: > We are happy to announce the availability of Spark 2.4.5! > > Spark 2.4.5 is a maintenance release containing

[ANNOUNCE] Announcing Apache Spark 2.4.5

2020-02-08 Thread Dongjoon Hyun
We are happy to announce the availability of Spark 2.4.5! Spark 2.4.5 is a maintenance release containing stability fixes. This release is based on the branch-2.4 maintenance branch of Spark. We strongly recommend all 2.4 users to upgrade to this stable release. To download Spark 2.4.5, head

Re: Why Apache Spark doesn't use Calcite?

2020-01-15 Thread Debajyoti Roy
ve or quantitative benchmark done before a >>> design >>> >>> decision was made not to use Calcite? >>> >>> >>> >>> Are there limitations (for heuristic based, cost based, * aware >>> optimizer) >>> >

Re: Why Apache Spark doesn't use Calcite?

2020-01-15 Thread Xiao Li
limitations (for heuristic based, cost based, * aware >> optimizer) >> >>> in Calcite, and frameworks built on top of Calcite? In the context of >> big >> >>> data / TCPH benchmarks. >> >>> >> >>> I was unable to d

Re: Why Apache Spark doesn't use Calcite?

2020-01-15 Thread Debajyoti Roy
f > big > >>> data / TCPH benchmarks. > >>> > >>> I was unable to dig up anything concrete from user group / Jira. > Appreciate > >>> if any Catalyst veteran here can give me pointers. Trying to defend > >>> Spark/Catalyst. > >>

Re: Why Apache Spark doesn't use Calcite?

2020-01-13 Thread Matei Zaharia
stic based, cost based, * aware optimizer) >>> in Calcite, and frameworks built on top of Calcite? In the context of big >>> data / TCPH benchmarks. >>> >>> I was unable to dig up anything concrete from user group / Jira. Appreciate >>> if any Catalyst veter

Re: Why Apache Spark doesn't use Calcite?

2020-01-13 Thread Michael Mior
up anything concrete from user group / Jira. Appreciate >> if any Catalyst veteran here can give me pointers. Trying to defend >> Spark/Catalyst. >> >> >> >> >> >> -- >> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >&g

Re: Why Apache Spark doesn't use Calcite?

2020-01-13 Thread Jason Nerothin
e from user group / Jira. Appreciate > if any Catalyst veteran here can give me pointers. Trying to defend > Spark/Catalyst. > > > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > ---

Why Apache Spark doesn't use Calcite?

2020-01-13 Thread newroyker
to dig up anything concrete from user group / Jira. Appreciate if any Catalyst veteran here can give me pointers. Trying to defend Spark/Catalyst. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com

Re: Issues with Apache Spark tgz file

2019-12-30 Thread rsinghania
Ah ok - yes, that worked for me as well. Thank you! Rajat -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Issues with Apache Spark tgz file

2019-12-30 Thread Marcelo Vanzin
ing now > > It looks like there's something wrong with the original tgz file; its size > is only 32 KB. > > Could one of the developers please have a look? > > Thanks very much, > Rajat > > > >

Issues with Apache Spark tgz file

2019-12-30 Thread rsinghania
is not recoverable: exiting now It looks like there's something wrong with the original tgz file; its size is only 32 KB. Could one of the developers please have a look? Thanks very much, Rajat -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com

Re: [ANNOUNCE] Announcing Apache Spark 3.0.0-preview2

2019-12-26 Thread infa elance
e wide-scale community testing of the upcoming Spark 3.0 > > release, the Apache Spark community has posted a new preview release > > of Spark 3.0. This preview is not a stable release in terms of either > > API or functionality, but it is meant to give the community early > > access to

Apache Spark Use cases - my first talk

2019-12-25 Thread Chetan Khatri
Hi Spark Users, Thank you for all support over the mailing list. Contributors - thanks for your all contributions. This is my first 5 mins talk with Apache Spark - https://youtu.be/bBqItpgT8xQ Thanks.

Re: [ANNOUNCE] Announcing Apache Spark 3.0.0-preview2

2019-12-24 Thread Star
Awesome work. Thanks and happy holidays~! On 2019-12-25 04:52, Yuming Wang wrote: Hi all, To enable wide-scale community testing of the upcoming Spark 3.0 release, the Apache Spark community has posted a new preview release of Spark 3.0. This preview is not a stable release in terms of either

Re: [ANNOUNCE] Announcing Apache Spark 3.0.0-preview2

2019-12-24 Thread Jungtaek Lim
Bests, >> Takeshi >> >> On Wed, Dec 25, 2019 at 6:00 AM Xiao Li wrote: >> >>> Thank you all. Happy Holidays! >>> >>> Xiao >>> >>> On Tue, Dec 24, 2019 at 12:53 PM Yuming Wang wrote: >>> >>>> Hi all, >&

Re: [ANNOUNCE] Announcing Apache Spark 3.0.0-preview2

2019-12-24 Thread Dongjoon Hyun
>> >> On Tue, Dec 24, 2019 at 12:53 PM Yuming Wang wrote: >> >>> Hi all, >>> >>> To enable wide-scale community testing of the upcoming Spark 3.0 >>> release, the Apache Spark community has posted a new preview release of >>> Spark 3.

Re: [ANNOUNCE] Announcing Apache Spark 3.0.0-preview2

2019-12-24 Thread Takeshi Yamamuro
Great work, Yuming! Bests, Takeshi On Wed, Dec 25, 2019 at 6:00 AM Xiao Li wrote: > Thank you all. Happy Holidays! > > Xiao > > On Tue, Dec 24, 2019 at 12:53 PM Yuming Wang wrote: > >> Hi all, >> >> To enable wide-scale community testing of the upcoming Sp

Re: [ANNOUNCE] Announcing Apache Spark 3.0.0-preview2

2019-12-24 Thread Xiao Li
Thank you all. Happy Holidays! Xiao On Tue, Dec 24, 2019 at 12:53 PM Yuming Wang wrote: > Hi all, > > To enable wide-scale community testing of the upcoming Spark 3.0 release, > the Apache Spark community has posted a new preview release of Spark 3.0. > This preview is *not a

[ANNOUNCE] Announcing Apache Spark 3.0.0-preview2

2019-12-24 Thread Yuming Wang
Hi all, To enable wide-scale community testing of the upcoming Spark 3.0 release, the Apache Spark community has posted a new preview release of Spark 3.0. This preview is *not a stable release in terms of either API or functionality*, but it is meant to give the community early access to try

[ANNOUNCE] Announcing Apache Spark 3.0.0-preview

2019-11-07 Thread Xingbo Jiang
Hi all, To enable wide-scale community testing of the upcoming Spark 3.0 release, the Apache Spark community has posted a preview release of Spark 3.0. This preview is *not a stable release in terms of either API or functionality*, but it is meant to give the community early access to try

Re: apache-spark Structured Stateful Streaming with window / SPARK-21641

2019-10-16 Thread Jungtaek Lim
time session window): https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredSessionization.scala There're two types of APIs in Spark Dataset - "typed" and "untyped". Most of features are available in unt

apache-spark Structured Stateful Streaming with window / SPARK-21641

2019-10-15 Thread Richard Reitmeyer
What’s the right way use Structured Streaming with both state and windows? Looking at the slides from https://www.slideshare.net/databricks/arbitrary-stateful-aggregations-using-structured-streaming-in-apache-spark slides 26 and 31, it looks like stateful processing events for every device

Re: Announcing .NET for Apache Spark 0.5.0

2019-09-30 Thread Holden Karau
Congratulations on the release :) On Mon, Sep 30, 2019 at 9:38 AM Terry Kim wrote: > We are thrilled to announce that .NET for Apache Spark 0.5.0 has been just > released <https://github.com/dotnet/spark/releases/tag/v0.5.0>! > > > > Some of the highlights

Announcing .NET for Apache Spark 0.5.0

2019-09-30 Thread Terry Kim
We are thrilled to announce that .NET for Apache Spark 0.5.0 has been just released <https://github.com/dotnet/spark/releases/tag/v0.5.0>! Some of the highlights of this release include: - Delta Lake <https://github.com/delta-io/delta>'s *DeltaTable *APIs - UDF improvements

[ANNOUNCE] Announcing Apache Spark 2.3.4

2019-09-09 Thread Kazuaki Ishizaki
We are happy to announce the availability of Spark 2.3.4! Spark 2.3.4 is a maintenance release containing stability fixes. This release is based on the branch-2.3 maintenance branch of Spark. We strongly recommend all 2.3.x users to upgrade to this stable release. To download Spark 2.3.4, head

Re: [ANNOUNCE] Announcing Apache Spark 2.4.4

2019-09-02 Thread Wesley Peng
on 2019/9/2 5:54, Dongjoon Hyun wrote: We are happy to announce the availability of Spark 2.4.4! Spark 2.4.4 is a maintenance release containing stability fixes. This release is based on the branch-2.4 maintenance branch of Spark. We strongly recommend all 2.4 users to upgrade to this stable

Re: [ANNOUNCE] Announcing Apache Spark 2.4.4

2019-09-01 Thread Hyukjin Kwon
YaY! 2019년 9월 2일 (월) 오후 1:27, Wenchen Fan 님이 작성: > Great! Thanks! > > On Mon, Sep 2, 2019 at 5:55 AM Dongjoon Hyun > wrote: > >> We are happy to announce the availability of Spark 2.4.4! >> >> Spark 2.4.4 is a maintenance release containing stability fixes. This >> release is based on the

Re: [ANNOUNCE] Announcing Apache Spark 2.4.4

2019-09-01 Thread Wenchen Fan
Great! Thanks! On Mon, Sep 2, 2019 at 5:55 AM Dongjoon Hyun wrote: > We are happy to announce the availability of Spark 2.4.4! > > Spark 2.4.4 is a maintenance release containing stability fixes. This > release is based on the branch-2.4 maintenance branch of Spark. We strongly > recommend all

[ANNOUNCE] Announcing Apache Spark 2.4.4

2019-09-01 Thread Dongjoon Hyun
We are happy to announce the availability of Spark 2.4.4! Spark 2.4.4 is a maintenance release containing stability fixes. This release is based on the branch-2.4 maintenance branch of Spark. We strongly recommend all 2.4 users to upgrade to this stable release. To download Spark 2.4.4, head

Re: JDK11 Support in Apache Spark

2019-08-25 Thread Hichame El Khalfi
Subject: Re: JDK11 Support in Apache Spark Great work! On Sun, Aug 25, 2019 at 6:03 AM Xiao Li mailto:lix...@databricks.com>> wrote: Thank you for your contributions! This is a great feature for Spark 3.0! We finally achieve it! Xiao On Sat, Aug 24, 2019 at 12:18 PM Felix

Re: JDK11 Support in Apache Spark

2019-08-25 Thread Wenchen Fan
t; -- >> *From:* ☼ R Nair >> *Sent:* Saturday, August 24, 2019 10:57:31 AM >> *To:* Dongjoon Hyun >> *Cc:* d...@spark.apache.org ; user @spark/'user >> @spark'/spark users/user@spark >> *Subject:* Re: JDK11 Support in Apache Spar

Re: JDK11 Support in Apache Spark

2019-08-24 Thread Xiao Li
To:* Dongjoon Hyun > *Cc:* d...@spark.apache.org ; user @spark/'user > @spark'/spark users/user@spark > *Subject:* Re: JDK11 Support in Apache Spark > > Finally!!! Congrats > > On Sat, Aug 24, 2019, 11:11 AM Dongjoon Hyun > wrote: > >> Hi, All. >> >>

Re: JDK11 Support in Apache Spark

2019-08-24 Thread Felix Cheung
That’s great! From: ☼ R Nair Sent: Saturday, August 24, 2019 10:57:31 AM To: Dongjoon Hyun Cc: d...@spark.apache.org ; user @spark/'user @spark'/spark users/user@spark Subject: Re: JDK11 Support in Apache Spark Finally!!! Congrats On Sat, Aug 24, 2019, 11:11

Re: JDK11 Support in Apache Spark

2019-08-24 Thread ☼ R Nair
Finally!!! Congrats On Sat, Aug 24, 2019, 11:11 AM Dongjoon Hyun wrote: > Hi, All. > > Thanks to your many many contributions, > Apache Spark master branch starts to pass on JDK11 as of today. > (with `hadoop-3.2` profile: Apache Hadoop 3.2 and Hive 2.3.6) > > > https:

Re: JDK11 Support in Apache Spark

2019-08-24 Thread DB Tsai
Congratulations on the great work! Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Sat, Aug 24, 2019 at 8:11 AM Dongjoon Hyun wrote: > > Hi, All. > > Thanks to your many many contributions, &g

JDK11 Support in Apache Spark

2019-08-24 Thread Dongjoon Hyun
Hi, All. Thanks to your many many contributions, Apache Spark master branch starts to pass on JDK11 as of today. (with `hadoop-3.2` profile: Apache Hadoop 3.2 and Hive 2.3.6) https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk

Re: Release Apache Spark 2.4.4

2019-08-14 Thread Dongjoon Hyun
][SS][PYTHON] Use InheritableThreadLocal for current epoch >>> in EpochTracker (to support Python UDFs) >>> <https://github.com/apache/spark/pull/24946> >>> >>> Thanks, >>> Terry >>> >>> On Tue, Aug 13, 2019 at 10:24 PM Wenchen Fan

Re: Release Apache Spark 2.4.4

2019-08-14 Thread Holden Karau
t;> [SPARK-27234][SS][PYTHON] Use InheritableThreadLocal for current epoch in >> EpochTracker (to support Python UDFs) >> <https://github.com/apache/spark/pull/24946> >> >> Thanks, >> Terry >> >> On Tue, Aug 13, 2019 at 10:24 PM Wenchen Fan wrote: >> >

Re: Release Apache Spark 2.4.4

2019-08-14 Thread Hyukjin Kwon
Adding Shixiong WDYT? 2019년 8월 14일 (수) 오후 2:30, Terry Kim 님이 작성: > Can the following be included? > > [SPARK-27234][SS][PYTHON] Use InheritableThreadLocal for current epoch in > EpochTracker (to support Python UDFs) > <https://github.com/apache/spark/pull/24946> > >

Re: Release Apache Spark 2.4.4

2019-08-13 Thread Terry Kim
Can the following be included? [SPARK-27234][SS][PYTHON] Use InheritableThreadLocal for current epoch in EpochTracker (to support Python UDFs) <https://github.com/apache/spark/pull/24946> Thanks, Terry On Tue, Aug 13, 2019 at 10:24 PM Wenchen Fan wrote: > +1 > > On Wed, Aug 14

Re: Release Apache Spark 2.4.4

2019-08-13 Thread Wenchen Fan
+1 On Wed, Aug 14, 2019 at 12:52 PM Holden Karau wrote: > +1 > Does anyone have any critical fixes they’d like to see in 2.4.4? > > On Tue, Aug 13, 2019 at 5:22 PM Sean Owen wrote: > >> Seems fine to me if there are enough valuable fixes to justify another >> release. If there are any other

Re: Release Apache Spark 2.4.4

2019-08-13 Thread Holden Karau
+1 Does anyone have any critical fixes they’d like to see in 2.4.4? On Tue, Aug 13, 2019 at 5:22 PM Sean Owen wrote: > Seems fine to me if there are enough valuable fixes to justify another > release. If there are any other important fixes imminent, it's fine to > wait for those. > > > On Tue,

RE: Release Apache Spark 2.4.4

2019-08-13 Thread Kazuaki Ishizaki
Thanks, Dongjoon! +1 Kazuaki Ishizaki, From: Hyukjin Kwon To: Takeshi Yamamuro Cc: Dongjoon Hyun , dev , User Date: 2019/08/14 09:21 Subject:[EXTERNAL] Re: Release Apache Spark 2.4.4 +1 2019년 8월 14일 (수) 오전 9:13, Takeshi Yamamuro 님 이 작성: Hi, Thanks for your

Re: Release Apache Spark 2.4.4

2019-08-13 Thread Sean Owen
Seems fine to me if there are enough valuable fixes to justify another release. If there are any other important fixes imminent, it's fine to wait for those. On Tue, Aug 13, 2019 at 6:16 PM Dongjoon Hyun wrote: > > Hi, All. > > Spark 2.4.3 was released three months ago (8th May). > As of today

Re: Release Apache Spark 2.4.4

2019-08-13 Thread Hyukjin Kwon
+1 2019년 8월 14일 (수) 오전 9:13, Takeshi Yamamuro 님이 작성: > Hi, > > Thanks for your notification, Dongjoon! > I put some links for the other committers/PMCs to access the info easily: > > A commit list in github from the last release: > https://github.com

Re: Release Apache Spark 2.4.4

2019-08-13 Thread Takeshi Yamamuro
Hi, Thanks for your notification, Dongjoon! I put some links for the other committers/PMCs to access the info easily: A commit list in github from the last release: https://github.com/apache/spark/compare/5ac2014e6c118fbeb1fe8e5c8064c4a8ee9d182a...branch-2.4 A issue list in jira: https

Re: Release Apache Spark 2.4.4

2019-08-13 Thread DB Tsai
+1 On Tue, Aug 13, 2019 at 4:16 PM Dongjoon Hyun wrote: > > Hi, All. > > Spark 2.4.3 was released three months ago (8th May). > As of today (13th August), there are 112 commits (75 JIRAs) in `branch-24` > since 2.4.3. > > It would be great if we can have Spark 2.4.4. > Shall we start `2.4.4

Release Apache Spark 2.4.4

2019-08-13 Thread Dongjoon Hyun
Hi, All. Spark 2.4.3 was released three months ago (8th May). As of today (13th August), there are 112 commits (75 JIRAs) in `branch-24` since 2.4.3. It would be great if we can have Spark 2.4.4. Shall we start `2.4.4 RC1` next Monday (19th August)? Last time, there was a request for K8s issue

CVE-2019-10099: Apache Spark unencrypted data on local disk

2019-08-06 Thread Imran Rashid
Severity: Important Vendor: The Apache Software Foundation Versions affected: All Spark 1.x, Spark 2.0.x, Spark 2.1.x, and 2.2.x versions Spark 2.3.0 to 2.3.2 Description: Prior to Spark 2.3.3, in certain situations Spark would write user data to local disk unencrypted, even if

Announcing .NET for Apache Spark 0.4.0

2019-07-31 Thread Terry Kim
We are thrilled to announce that .NET for Apache Spark 0.4.0 has been just released <https://github.com/dotnet/spark/releases/tag/v0.4.0>! Some of the highlights of this release include: - Apache Arrow backed UDFs (Vector UDF, Grouped Map UDF) - Robust UDF-related assembly l

Apache Spark Log4j logging applicationId

2019-07-23 Thread Luca Borin
Hi, I would like to add the applicationId to all logs produced by Spark through Log4j. Consider that I have a cluster with several jobs running in it, so the presence of the applicationId would be useful to logically divide them. I have found a partial solution. If I change the layout of the

Re: Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-16 Thread Dongjoon Hyun
at building consensus (even for 3.0.0). In any way, could you ping the reviewers once more on those PRs which you have concerns? If it is merged into `branch-2.4`, it will be Apache Spark 2.4.4 of course. Bests, Dongjoon. On Tue, Jul 16, 2019 at 4:00 AM Kazuaki Ishizaki wrote: > Thank you Dongj

Re: Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-16 Thread Kazuaki Ishizaki
Thank you Dongjoon for being a release manager. If the assumed dates are ok, I would like to volunteer for an 2.3.4 release manager. Best Regards, Kazuaki Ishizaki, From: Dongjoon Hyun To: dev , "user @spark" , Apache Spark PMC Date: 2019/07/13 07:18 Subject:[EX

Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-16 Thread Stavros Kontopoulos
Hi Dongjoon, Should we also consider fixing https://issues.apache.org/jira/browse/SPARK-27812 before the cut? Best, Stavros On Mon, Jul 15, 2019 at 7:04 PM Dongjoon Hyun wrote: > Hi, Apache Spark PMC members. > > Can we cut Apache Spark 2.4.4 next Monday (22nd July)? > > Be

Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-15 Thread Dongjoon Hyun
Hi, Apache Spark PMC members. Can we cut Apache Spark 2.4.4 next Monday (22nd July)? Bests, Dongjoon. On Fri, Jul 12, 2019 at 3:18 PM Dongjoon Hyun wrote: > Thank you, Jacek. > > BTW, I added `@private` since we need PMC's help to make an Apache Spark > release. > > Can I

Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-12 Thread Dongjoon Hyun
Thank you, Jacek. BTW, I added `@private` since we need PMC's help to make an Apache Spark release. Can I get more feedbacks from the other PMC members? Please me know if you have any concerns (e.g. Release date or Release manager?) As one of the community members, I assumed the followings

Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-11 Thread Jacek Laskowski
Hi, Thanks Dongjoon Hyun for stepping up as a release manager! Much appreciated. If there's a volunteer to cut a release, I'm always to support it. In addition, the more frequent releases the better for end users so they have a choice to upgrade and have all the latest fixes or wait. It's their

Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-11 Thread Dongjoon Hyun
Additionally, one more correctness patch landed yesterday. - SPARK-28015 Check stringToDate() consumes entire input for the and -[m]m formats Bests, Dongjoon. On Tue, Jul 9, 2019 at 10:11 AM Dongjoon Hyun wrote: > Thank you for the reply, Sean. Sure. 2.4.x should be a LTS

Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-09 Thread Dongjoon Hyun
Thank you for the reply, Sean. Sure. 2.4.x should be a LTS version. The main reason of 2.4.4 release (before 3.0.0) is to have a better basis for comparison to 3.0.0. For example, SPARK-27798 had an old bug, but its correctness issue is only exposed at Spark 2.4.3. It would be great if we can

Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-09 Thread Sean Owen
We will certainly want a 2.4.4 release eventually. In fact I'd expect 2.4.x gets maintained for longer than the usual 18 months, as it's the last 2.x branch. It doesn't need to happen before 3.0, but could. Usually maintenance releases happen 3-4 months apart and the last one was 2 months ago. If

Release Apache Spark 2.4.4 before 3.0.0

2019-07-09 Thread Dongjoon Hyun
Hi, All. Spark 2.4.3 was released two months ago (8th May). As of today (9th July), there exist 45 fixes in `branch-2.4` including the following correctness or blocker issues. - SPARK-26038 Decimal toScalaBigInt/toJavaBigInteger not work for decimals not fitting in long - SPARK-26045

Re: RE - Apache Spark compatibility with Hadoop 2.9.2

2019-06-24 Thread Bipul kumar
8:06 PM Mark Bidewell wrote: >> >>> I have done a setup with Hadoop 2.9.2 and Spark 2.2.2. Apache Zeppelin >>> is fine but some our internally developed apps need work on dependencies >>> >>> On Sun, Jun 23, 2019, 07:50 Bipul kumar >>> wrote: >

Re: RE - Apache Spark compatibility with Hadoop 2.9.2

2019-06-23 Thread Mark Bidewell
up with Hadoop 2.9.2 and Spark 2.2.2. Apache Zeppelin >> is fine but some our internally developed apps need work on dependencies >> >> On Sun, Jun 23, 2019, 07:50 Bipul kumar >> wrote: >> >>> Hello People ! >>> >>> I am new to Apa

Re: RE - Apache Spark compatibility with Hadoop 2.9.2

2019-06-23 Thread Bipul kumar
:06 PM Mark Bidewell wrote: > I have done a setup with Hadoop 2.9.2 and Spark 2.2.2. Apache Zeppelin is > fine but some our internally developed apps need work on dependencies > > On Sun, Jun 23, 2019, 07:50 Bipul kumar > wrote: > >> Hello People ! >> >> I

Re: RE - Apache Spark compatibility with Hadoop 2.9.2

2019-06-23 Thread Mark Bidewell
I have done a setup with Hadoop 2.9.2 and Spark 2.2.2. Apache Zeppelin is fine but some our internally developed apps need work on dependencies On Sun, Jun 23, 2019, 07:50 Bipul kumar wrote: > Hello People ! > > I am new to Apache Spark , and just started learning it. > Few ques

RE - Apache Spark compatibility with Hadoop 2.9.2

2019-06-23 Thread Bipul kumar
Hello People ! I am new to Apache Spark , and just started learning it. Few questions i have in my mind which i am seeking here for 1 . Is there any compatibility with Apache Spark while using Hadoop.? Let say i am running Hadoop 2.9.2, which Apache Spark should i use? 2. As mentioned , i

Phrase Search using Apache Spark in huge amount of text in files

2019-05-28 Thread Sandeep Giri
Dear Spark Users, If you want to search a list of phrases, approx. 10,000 each having words between 1 to 6, in a large amount of text (approximately 10GB) how do you go about it? I ended up wiring a small RDD based libraries: https://github.com/cloudxlab/phrasesearch I would like to get feedback

[ANNOUNCE] Announcing Apache Spark 2.4.3

2019-05-09 Thread Xiao Li
We are happy to announce the availability of Spark 2.4.3! Spark 2.4.3 is a maintenance release containing stability fixes. This release is based on the branch-2.4 maintenance branch of Spark. We strongly recommend all 2.4 users to upgrade to this stable release. Note that 2.4.3 switched the

Re: Best notebook for developing for apache spark using scala on Amazon EMR Cluster

2019-05-01 Thread Jeff Zhang
You can configure zeppelin to store your notes in S3 http://zeppelin.apache.org/docs/0.8.1/setup/storage/storage.html#notebook-storage-in-s3 V0lleyBallJunki3 于2019年5月1日周三 上午5:26写道: > Hello. I am using Zeppelin on Amazon EMR cluster while developing Apache > Spark programs in

Best notebook for developing for apache spark using scala on Amazon EMR Cluster

2019-04-30 Thread V0lleyBallJunki3
Hello. I am using Zeppelin on Amazon EMR cluster while developing Apache Spark programs in Scala. The problem is that once that cluster is destroyed I lose all the notebooks on it. So over a period of time I have a lot of notebooks that require to be manually exported into my local disk and from

[ANNOUNCE] Announcing Apache Spark 2.4.1

2019-04-04 Thread DB Tsai
+user list We are happy to announce the availability of Spark 2.4.1! Apache Spark 2.4.1 is a maintenance release, based on the branch-2.4 maintenance branch of Spark. We strongly recommend all 2.4.0 users to upgrade to this stable release. In Apache Spark 2.4.1, Scala 2.12 support is GA

Re: Qn about decision tree apache spark java

2019-04-04 Thread Abdeali Kothari
tion example. I found the dataset at > > https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt > and I have some trouble understanding its format. Is the first column > the label? Why are there indices and a colon in front of other number > values and what do the

Qn about decision tree apache spark java

2019-04-04 Thread Serena S Yuan
Hi, I am trying to use apache spark's decision tree classifier. I am trying to implement the method found in https://spark.apache.org/docs/1.5.1/ml-decision-tree.html 's classification example. I found the dataset at https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt

Apache Spark Newsletter Issue 2

2019-03-23 Thread Ankur Gupta
Hello, Issue two of the newsletter https://newsletterspot.com/apache-spark/2/ Feel free to submit articles to the newsletter https://newsletterspot.com/apache-spark/submit/ Next issue onwards will be adding * Spark Events / User Meetups * Tags to identifying content e.g. videos

Re: Why does Apache Spark Master shutdown when Zookeeper expires the session

2019-03-06 Thread Jungtaek Lim
s I understand, Apache Spark Master can be run in high availability mode > using Zookeeper. That is, multiple Spark masters can run in Leader/Follower > mode and these modes are registered with Zookeeper. > > In our scenario Zookeeper is expiring the Spark Master's session which is >

Why does Apache Spark Master shutdown when Zookeeper expires the session

2019-03-05 Thread lokeshkumar
As I understand, Apache Spark Master can be run in high availability mode using Zookeeper. That is, multiple Spark masters can run in Leader/Follower mode and these modes are registered with Zookeeper. In our scenario Zookeeper is expiring the Spark Master's session which is acting as Leader. So

Re: Looking for an apache spark mentor

2019-02-19 Thread Robert Kaye
> On Feb 19, 2019, at 2:26 PM, Shyam P wrote: > > What IRC channel we should join? I should’ve included info in the first place, heh. Sorry: #metabrainz on freenode, please. I am ruaok, but pristine and iliekcomputers are also very much interested in learning more about Spark. Thanks! --

Re: Looking for an apache spark mentor

2019-02-19 Thread Shyam P
rainz is aiming to re-create what > last.fm used to be — we’ve already got 200M listens (AKA scrabbles) from > our users (which is not a lot, really). We’ve setup an Apache Spark cluster > and are starting to build user listening statistics using this setup. > > While our setup is

Looking for an apache spark mentor

2019-02-19 Thread Robert Kaye
ast.fm <http://last.fm/> used to be — we’ve already got 200M listens (AKA scrabbles) from our users (which is not a lot, really). We’ve setup an Apache Spark cluster and are starting to build user listening statistics using this setup. While our setup is working, we can see that we’re not goin

Re: [ANNOUNCE] Announcing Apache Spark 2.3.3

2019-02-18 Thread Wenchen Fan
great job! On Mon, Feb 18, 2019 at 4:24 PM Hyukjin Kwon wrote: > Yay! Good job Takeshi! > > On Mon, 18 Feb 2019, 14:47 Takeshi Yamamuro >> We are happy to announce the availability of Spark 2.3.3! >> >> Apache Spark 2.3.3 is a maintenance release, based on the branc

Re: [ANNOUNCE] Announcing Apache Spark 2.3.3

2019-02-18 Thread Hyukjin Kwon
Yay! Good job Takeshi! On Mon, 18 Feb 2019, 14:47 Takeshi Yamamuro We are happy to announce the availability of Spark 2.3.3! > > Apache Spark 2.3.3 is a maintenance release, based on the branch-2.3 > maintenance branch of Spark. We strongly recommend all 2.3.x users to > upgrade to

[ANNOUNCE] Announcing Apache Spark 2.3.3

2019-02-17 Thread Takeshi Yamamuro
We are happy to announce the availability of Spark 2.3.3! Apache Spark 2.3.3 is a maintenance release, based on the branch-2.3 maintenance branch of Spark. We strongly recommend all 2.3.x users to upgrade to this stable release. To download Spark 2.3.3, head over to the download page: http

Re: CVE-2018-11760: Apache Spark local privilege escalation vulnerability

2019-01-31 Thread Imran Rashid
I received some questions about what the exact change was which fixed the issue, and the PMC decided to post info in jira to make it easier for the community to track. The relevant details are all on https://issues.apache.org/jira/browse/SPARK-26802 On Mon, Jan 28, 2019 at 1:08 PM Imran Rashid

CVE-2018-11760: Apache Spark local privilege escalation vulnerability

2019-01-28 Thread Imran Rashid
Severity: Important Vendor: The Apache Software Foundation Versions affected: All Spark 1.x, Spark 2.0.x, and Spark 2.1.x versions Spark 2.2.0 to 2.2.2 Spark 2.3.0 to 2.3.1 Description: When using PySpark , it's possible for a different local user to connect to the Spark application and

Re: [ANNOUNCE] Announcing Apache Spark 2.2.3

2019-01-16 Thread Takeshi Yamamuro
Thanks, Dongjoon! On Wed, Jan 16, 2019 at 5:23 PM Hyukjin Kwon wrote: > Nice! > > 2019년 1월 16일 (수) 오전 11:55, Jiaan Geng 님이 작성: > >> Glad to hear this. >> >> >> >> -- >> Sent from: htt

Re: [ANNOUNCE] Announcing Apache Spark 2.2.3

2019-01-16 Thread Hyukjin Kwon
Nice! 2019년 1월 16일 (수) 오전 11:55, Jiaan Geng 님이 작성: > Glad to hear this. > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >

Re: [ANNOUNCE] Announcing Apache Spark 2.2.3

2019-01-15 Thread Jiaan Geng
Glad to hear this. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

<    1   2   3   4   5   6   7   8   9   10   >