Re: [ANNOUNCE] New Celeborn PMC Member: Nicholas Jiang

2024-07-23 Thread Mridul Muralidharan
Congratulations ! Regards, Mridul On Tue, Jul 23, 2024 at 12:37 PM Fei Wang wrote: > Congrats! > > Regards, > Fei Wang > > On 2024/07/23 10:19:37 Keyong Zhou wrote: > > Congrats! > > > > Regards, > > Keyong Zhou > > > > angers zhu 于2024年7月23日周二 18:09写道: > > > > > Congrats! > > > > > > > > >

Re: Re: [ANNOUNCE] New Celeborn Committer: Fei Wang

2024-07-23 Thread Mridul Muralidharan
Congratulations ! Regards, Mridul On Tue, Jul 23, 2024 at 12:54 AM rexxiong wrote: > Congratulations! > > Regards, > Jiashu Xiong > > Nicholas Jiang 于2024年7月23日周二 13:02写道: > > > Congratulations!Regards, > > > > Nicholas Jiang > > > > > > 在 2024-07-23 12:21:19,"Yihe Li" 写道: > >

Re: Question regarding TLS support in Celeborn

2024-07-11 Thread Mridul Muralidharan
Hi, Yes, it is supported. Note that in addition to TLS communications specifically within Celeborn, the Apache Ratis message exchange (for Raft HA) requires use of grpc - TLS is not supported with raft rpc type = netty. Regards, Mridul On Thu, Jul 11, 2024 at 8:57 PM lohit wrote: > Hello

Re: [DISCUSS] CIP-10: Introduce Celeborn Chaos Testing Framework

2024-07-10 Thread Mridul Muralidharan
Hi, This is a great idea - and would go a long way in flushing out bugs and issues - and improving the overall robustness of Celeborn ! It would also be good to have: a) Capture a (replay) log of all events which were triggered. b) Ability to 'replay' the log and deterministically reach the

Re: Jira version update

2024-07-09 Thread Mridul Muralidharan
Thanks Keyong ! Regards, Mridul On Thu, Jul 4, 2024 at 9:22 PM Keyong Zhou wrote: > Thanks Mridul for pointing this out, I just modified 0.5.0 as released :) > > Regards, > Keyong Zhou > > Mridul Muralidharan 于2024年7月5日周五 03:13写道: > > > Hi, > > > >

Jira version update

2024-07-04 Thread Mridul Muralidharan
Hi, While updating an issue manually, I noticed that 0.5.0 is still mentioned as an unreleased version in jira. Given 0.5 release, we should be getting it updated ? Regards, Mridul

Re: [VOTE] Release Apache Celeborn 0.5.0-rc3

2024-06-24 Thread Mridul Muralidharan
Forgot to update here. Signatures, digests, etc check out fine. Checked out tag and build/tested with "-Pspark3.1" I keep getting the following error: - metrics/prometheus *** FAILED *** 200 did not equal 404 (ApiBaseResourceSuite.scala:90) - metrics/json *** FAILED *** 200 did not equal

Re: Re: [VOTE] Contrinute Apache Celeborn CLI

2024-06-12 Thread Mridul Muralidharan
+1 Regards, Mridul On Wed, Jun 12, 2024 at 1:08 AM Shaoyun Chen wrote: > +1 > > Keyong Zhou 于2024年6月12日周三 13:47写道: > > > > +1 > > > > Thanks for the proposal! > > > > Regards, > > Keyong Zhou > > > > Nicholas Jiang 于2024年6月12日周三 13:02写道: > > > > > +1. Looking forward to Celeborn CLI. > > >

Re: Re: [Discussion] Proposal Management in Celeborn Community

2024-06-11 Thread Mridul Muralidharan
2024 at 12:33 PM Nicholas wrote: > > > > > Hi Jiashu, > > > > > > > > > > > > > > > +1 for me. According to my experience in the Flink community, the > > > discussion of the CIP is commented in dev maillist instead of commented > > in >

Re: [DISCUSS] Celeborn CLI Proposal

2024-06-10 Thread Mridul Muralidharan
Hi, Looks good to me as well, I had reviewed this proposal internally already :-) Regards, Mridul On Fri, Jun 7, 2024 at 11:32 PM Keyong Zhou wrote: > Hi Aravind, > > Thanks for the proposal! The proposal LGTM, I think it's very valuable. > > Regards, > Keyong Zhou > > Aravind Patnam

Re: [Discussion] Proposal Management in Celeborn Community

2024-05-29 Thread Mridul Muralidharan
Inline comments, discussions are invaluable for design docs - this is not yet supported in confluence right ? Another option would be to iterate and discuss through other means (like google docs), and before vote, move it to the wiki - so that the community is deciding/voting on artifacts which

Re: [VOTE] Release Apache Celeborn 0.4.1-rc1

2024-05-20 Thread Mridul Muralidharan
+1 Signatures, digests, etc check out fine. Checked out tag and build/tested with "-Pspark3.1" Regards, Mridul On Sun, May 19, 2024 at 10:19 PM rexxiong wrote: > +1 (binding) > I checked > - Download links are valid. > - git commit hash is correct > - Checksums and signatures are valid. > -

Re: [DRAFT] Celeborn Board Report

2024-05-04 Thread Mridul Muralidharan
> Regards, > Keyong Zhou > > Mridul Muralidharan 于2024年5月3日周五 23:38写道: > > > Hi, > > > > I meant call it out as part of the board report, so that it is captured > > in our updates to board. > > > > This is the first update post TLP, right ? > &

Re: [DRAFT] Celeborn Board Report

2024-05-03 Thread Mridul Muralidharan
doJW-f3BZAvxciDbI3mTw > <https://mp.weixin.qq.com/s/DdoJW-f3BZAvxciDbI3mTw> > > It'll be great if we can call out louder, do you have any idea? : ) > > Regards, > Keyong Zhou > > Mridul Muralidharan 于2024年5月3日周五 07:40写道: > > > Hi, > > > > Do we want to call o

Re: [DRAFT] Celeborn Board Report

2024-05-02 Thread Mridul Muralidharan
ct graduated recently). > - Chandni Singh was added as committer on 2024-03-21. > - Mridul Muralidharan was added as committer on 2024-04-29. > > ## Project Activity: > Software development activity: > > - We are preparing to release 0.4.1 in May. > - We are preparing to r

Re: [ANNOUNCE] Add Mridul Muralidharan as new committer

2024-04-29 Thread Mridul Muralidharan
r 29, 2024, at 09:21, Keyong Zhou wrote: > > > > > > Hi Celeborn Community, > > > > > > The Project Management Committee (PMC) for Apache Celeborn > > > has invited Mridul Muralidharan to become a committer and we are > pleased > > > to announ

Re: [DISCUSS] Time for 0.4.1

2024-04-19 Thread Mridul Muralidharan
+1 Regards, Mridul On Thu, Apr 18, 2024 at 11:50 PM Ethan Feng wrote: > +1 > > Thanks, > Ethan Feng > > Yu Li 于2024年4月16日周二 17:20写道: > > > > +1, thanks for driving this and volunteering as our RM, Nicholas! > > > > Best Regards, > > Yu > > > > On Sat, 13 Apr 2024 at 10:31, Keyong Zhou

Re: [ANNOUNCE] Apache Celeborn is graduated to Top Level Project

2024-03-26 Thread Mridul Muralidharan
Congratulations !! Regards, Mridul On Tue, Mar 26, 2024 at 11:54 PM Nicholas Jiang wrote: > Congratulations! Witness the continuous development of the > community.Regards, > Nicholas Jiang > At 2024-03-25 20:49:36, "Ethan Feng" wrote: > >Hello Celeborn community, > > > >I am glad to share

Re: Maven 'stuck' in service test compilation ?

2024-03-21 Thread Mridul Muralidharan
al information, JDK version, etc. > > Regards, > Ethan Feng > > Mridul Muralidharan 于2024年3月21日周四 15:25写道: > > > > Hi, > > > > > > I am observing that a maven build gets 'stuck' when compiling > "services" > > for running tests. >

Re: [ANNOUNCE] Add Chandni Singh as new committer

2024-03-21 Thread Mridul Muralidharan
Congratulations Chandni ! Great job :-) Regards, Mridul On Thu, Mar 21, 2024 at 3:30 AM Keyong Zhou wrote: > Hi Celeborn Community, > > The Podling Project Management Committee (PPMC) for Apache Celeborn > has invited Chandni Singh to become a committer and we are pleased > to announce that

Maven 'stuck' in service test compilation ?

2024-03-21 Thread Mridul Muralidharan
Hi, I am observing that a maven build gets 'stuck' when compiling "services" for running tests. Without tests, this goes through: $ ARGS="-Pspark-3.1"; ./build/mvn $ARGS clean 2>&1 | tee clean_output.txt && ./build/mvn -DskipTests $ARGS package 2>&1 | tee build_output.txt This gets stuck

Re: [VOTE] Graduate Apache Celeborn (incubating) as a TLP - Community

2024-03-01 Thread Mridul Muralidharan
+1 Regards, Mridul On Fri, Mar 1, 2024 at 4:35 AM Nicholas wrote: > > +1. > > > Regards, > Nicholas Jiang > > > > > -- > 发自我的网易邮箱手机智能版 > > > > - Original Message - > From: "Yu Li" > To: dev@celeborn.apache.org > Sent: Fri, 1 Mar 2024 16:52:10 +0800 > Subject: [VOTE] Graduate Apache

Re: [DISCUSS] Graduate Celeborn as TLP

2024-02-28 Thread Mridul Muralidharan
+1 Looking forward to Celeborn as a TLP ! Best wishes to the community :-) Regards, Mridul On Tue, Feb 27, 2024 at 5:23 AM Willem Jiang wrote: > Thanks for the clarification. Now we are good to go. > > Willem Jiang > > > > On Tue, Feb 27, 2024 at 7:15 PM Keyong Zhou wrote: > > > > Thanks

Re: [ANNONCE] New PPMC member: Fu Chen

2024-02-19 Thread Mridul Muralidharan
Congratulations ! Regards, Mridul On Mon, Feb 19, 2024 at 7:46 PM Cheng Pan wrote: > Congrats! > > Thanks, > Cheng Pan > > > > On Feb 20, 2024, at 08:22, Nicholas wrote: > > > > Congratulations to Fu Chen!Regards, > > Nicholas Jiang > > > > > > > > > > At 2024-02-20 00:23:06, "Shaoyun Chen"

Re: Large number of incubator-celeb...@noreply.github.com emails

2024-02-06 Thread Mridul Muralidharan
to choose JIRA at last. Seems > different projects have different preferences. Maybe > newer projects tends to use Github. > > To me, I'm actually fine with both. JIRA works well so far, will using > Github be more beneficial? Glad to hear about your opinion. > > Thanks, >

Re: Large number of incubator-celeb...@noreply.github.com emails

2024-02-06 Thread Mridul Muralidharan
Looks like I am wrong, github issues can be used [1]. Is Celeborn planning to use github issues going forward ? Regards, Mridul [1] https://www.apache.org/dev/#issues On Wed, Feb 7, 2024 at 12:00 AM Mridul Muralidharan wrote: > Hi, > > I received a fairly large number

Large number of incubator-celeb...@noreply.github.com emails

2024-02-06 Thread Mridul Muralidharan
Hi, I received a fairly large number of emails to incubator-celeb...@noreply.github.com, which typically are for PR's. They appear to be github issues - are we trying to move to github issues instead of Apache jira ? IIRC there is a policy to use jira for tracking bugs/improvements, right ?

Re: [VOTE] Release Apache Celeborn(Incubating) 0.3.2-incubating-rc0

2023-12-19 Thread Mridul Muralidharan
+1 Signatures, digests, license, etc check out fine. Checked out tag and build/tested with -Pspark3.1 and -Pflink-1.17 Regards, Mridul On Tue, Dec 19, 2023 at 8:06 PM rexxiong wrote: > +1 (binding) > I checked > - Download links are valid. > - git commit hash is correct > - Checksums and

Re: [DISCUSS] Time for 0.3.2

2023-12-06 Thread Mridul Muralidharan
+1 on 0.3.2, thanks Nicholas ! Regards, Mridul On Thu, Dec 7, 2023 at 12:51 AM Cheng Pan wrote: > +1, thanks for volunteering. > > Feel free to ping me if you encounter permission issues during the release > phase. > > Thanks, > Cheng Pan > > > > On Dec 7, 2023, at 14:31, Nicholas wrote: > >

Re: [ANNOUNCE] Add Yihe Li as new committer

2023-11-22 Thread Mridul Muralidharan
Congratulations Yihe Li ! Regards, Mridul On Wed, Nov 22, 2023 at 2:08 AM Yu Li wrote: > Congratulations, Yihe! > > Best Regards, > Yu > > > On Fri, 17 Nov 2023 at 15:32, Shaoyun Chen wrote: > > > Congrats! > > > > Keyong Zhou 于2023年11月16日周四 20:25写道: > > > > > > Hi Celeborn Community, > > >

Re: [PROPOSAL] Spark stage resubmission for shuffle fetch failure

2023-11-03 Thread Mridul Muralidharan
apAndMergeOutput > > if the current ShuffleMapStage is Indeterminate. What if the current > stage > > is determinate, but its > > upstream stage is Indeterminate, and its upstream stage is rerun? > > > > Thanks, > > Keyong Zhou > > > > Mridul Muralidharan 于

Re: [PROPOSAL] Spark stage resubmission for shuffle fetch failure

2023-10-19 Thread Mridul Muralidharan
, and then retry these stages (same shuffle-id, a new stage attempt) Regards, Mridul On Thu, Oct 19, 2023 at 10:08 PM Mridul Muralidharan wrote: > > Good question, and ResultStage is actually special cased in spark as its > output could have already been consumed (for example collect()

Re: [PROPOSAL] Spark stage resubmission for shuffle fetch failure

2023-10-19 Thread Mridul Muralidharan
E ShuffleMapStage got entirely recomputed, the > >> corresponding ResultStage should be entirely recomputed as well, per my > >> understanding > >> > >> I found https://issues.apache.org/jira/browse/SPARK-25342 to rollback a > >> ResultStage but it was not merged >

Re: [PROPOSAL] Spark stage resubmission for shuffle fetch failure

2023-10-16 Thread Mridul Muralidharan
ompute and > remaining shuffle data needs a lot of work to do in Celeborn > I prefer to implement a simple whole stage recompute first with interface > defined with recomputeAll = true flag, and explore partial stage recompute > in seperate ticket as future optimization > How do you th

Re: Question on Celeborn workers,

2023-10-16 Thread Mridul Muralidharan
With push based shuffle in Apache Spark (magnet), we have both the map output and reducer orientated merged output preserved - with reducer oriented view chosen by default for reads and fallback to mapper output when reducer output is missing/failures. That mitigates this specific issue for

Re: [PROPOSAL] Spark stage resubmission for shuffle fetch failure

2023-10-14 Thread Mridul Muralidharan
On Sat, Oct 14, 2023 at 3:49 AM Mridul Muralidharan wrote: > > A reducer oriented view of shuffle, especially without replication, could > indeed be susceptible to this issue you described (a single fetch failure > would require all mappers to need to be recomputed) - note, not neces

Re: [PROPOSAL] Spark stage resubmission for shuffle fetch failure

2023-10-14 Thread Mridul Muralidharan
utation could be (partially or totally) reused. > > Regards, > > --- Sungwoo > > On Sat, Oct 14, 2023 at 5:24 PM Mridul Muralidharan > wrote: > >> >> Hi, >> >> Spark will try to minimize the recomputation cost as much as possible. >> For example, if

Re: [PROPOSAL] Spark stage resubmission for shuffle fetch failure

2023-10-14 Thread Mridul Muralidharan
Hi, Spark will try to minimize the recomputation cost as much as possible. For example, if parent stage was DETERMINATE, it simply needs to recompute the missing (mapper) partitions (which resulted in fetch failure). Note, this by itself could require further recomputation in the DAG if the

Re: [PROPOSAL] Spark stage resubmission for shuffle fetch failure

2023-10-13 Thread Mridul Muralidharan
Hi, So there are a couple of things here based on whether the stages are DETERMINATE or INDETERMINATE. The exit I added to my example was to trigger some of these cases, and you can come up with more involved scenarios where this would apply :-) At a high level, we have the following: a) If

Re: [PROPOSAL] Spark stage resubmission for shuffle fetch failure

2023-09-23 Thread Mridul Muralidharan
Hi, I am not yet very familiar with Celeborn, so will restrict my notes on the proposal in context to Apache Spark: a) For Option 1, there is SPARK-25299 - which was started a few years back. Unfortunately, the work there has stalled: but if there is interest in pushing that forward, I can

Re: [DISCUSSION] Support memory file storage.

2023-09-21 Thread Mridul Muralidharan
n move existing shuffle files to different storage tires. > > c) As mentioned above, the enhancement is intended to act as a storage > tier that's why I explained the details about how it is handled > internally. > > Thanks again for your email. Please let me know if you have any >

Re: [DISCUSSION] Support memory file storage.

2023-09-20 Thread Mridul Muralidharan
Hi, This should be a nontrivial improvement to Celeborn imo, thanks Ethan ! I had a few queries: a) Are we viewing this enhancement as a cache or as a tiered storage layer ? When going over it, I felt the proposal might be doing both - though leaning more as a cache, but wanted to get

Re: [DISCUSS] Support authentication in Celeborn

2023-09-18 Thread Mridul Muralidharan
To add to what Chandni mentioned, using self-signed certificates and trusting them is another (though less secure) practice some deployments leverage. This ensures encryption over the wire, but does not allow for clients to validate identity of the Celeborn server components (so potentially

Re: [VOTE] Release Apache Celeborn(Incubating) 0.3.1-incubating-rc0

2023-08-31 Thread Mridul Muralidharan
+1 Signatures, digests, license, etc check out fine. Checked out tag and build/tested with -Pspark-3.1 Regards, Mridul On Thu, Aug 31, 2023 at 11:35 AM Cheng Pan wrote: > Hi Celeborn community, > > This is a call for a vote to release Apache Celeborn (Incubating) > 0.3.1-incubating-rc0 > >

Re: [DISCUSS] Allow external contributors to run CI without approval

2023-06-16 Thread Mridul Muralidharan
Agree, +1 Regards, Mridul On Fri, Jun 16, 2023 at 9:16 AM Cheng Pan wrote: > +1 for "only requires approval first time" > > Keyong Zhou 于 2023年6月16日周五 下午5:48写道: > > > +1 > > > > Thanks, > > Keyong Zhou > > > > Ethan Feng 于2023年6月16日周五 16:27写道: > > > > > Recent moves by Apache Infra have