Re: Welcome Xinrong Meng as a Spark committer

2022-08-09 Thread Weichen Xu
Congrats! On Tue, Aug 9, 2022 at 5:55 PM Jungtaek Lim wrote: > Congrats Xinrong! Well deserved. > > 2022년 8월 9일 (화) 오후 5:13, Hyukjin Kwon 님이 작성: > >> Hi all, >> >> The Spark PMC recently added Xinrong Meng as a committer on the project. >> Xinrong is the major contributor of PySpark especially

Re: Is RDD thread safe?

2019-11-25 Thread Weichen Xu
neously, > if both returns none when they get same block from BlockManager(i.e. #1 > above), then I guess the same data would be cached twice. > > If the later cache could override the previous data, and no memory is > waste, then this is OK > > Thanks > Chang > > >

Re: Is RDD thread safe?

2019-11-24 Thread Weichen Xu
t lazy, so there is race condition. > > Thanks > Chang > > > Weichen Xu 于2019年11月12日周二 下午1:22写道: > >> Hi Chang, >> >> RDD/Dataframe is immutable and lazy computed. They are thread safe. >> >> Thanks! >> >> On Tue, Nov 12, 2019 a

Re: Is RDD thread safe?

2019-11-11 Thread Weichen Xu
Hi Chang, RDD/Dataframe is immutable and lazy computed. They are thread safe. Thanks! On Tue, Nov 12, 2019 at 12:31 PM Chang Chen wrote: > Hi all > > I meet a case where I need cache a source RDD, and then create different > DataFrame from it in different threads to accelerate query. > > I

Re: Add spark dependency on on org.opencypher:okapi-shade.okapi

2019-10-18 Thread Weichen Xu
have support, I'd back out the >> existing changes. >> I was initially skeptical about how much this needs to be in Spark vs a >> third-party package, and that still stands. >> >> The addition of another dependency isn't that big a deal IMHO, but, yes, >> it

Re: Add spark dependency on on org.opencypher:okapi-shade.okapi

2019-10-15 Thread Weichen Xu
proposed PR: > https://github.com/apache/spark/pull/24851. > > Thank you > Mats > > > On Tue, Oct 15, 2019 at 10:38 AM Weichen Xu > wrote: > >> Hi everyone, >> >> I'd like to call a new vote on the issue: should we add dependency >> "org.opencyph

Add spark dependency on on org.opencypher:okapi-shade.okapi

2019-10-15 Thread Weichen Xu
Hi everyone, I'd like to call a new vote on the issue: should we add dependency "org.opencypher:okapi-shade.okapi" into spark ? The issue background is: Spark is going to add a big feature "Spark Graph", the prototypical implementation is here https://github.com/apache/spark/pull/24297 which

Re: Spark 3.0 preview release feature list and major changes

2019-10-10 Thread Weichen Xu
Wait... I have some supplement: *New API:* SPARK-25097 Support prediction on single instance in KMeans/BiKMeans/GMM SPARK-28045 add missing RankingEvaluator SPARK-29121 Support Dot Product for Vectors *Behavior change or new API with behavior change:* SPARK-23265 Update multi-column error

Re: [DISCUSS] Migrate development scripts under dev/ from Python2 to Python 3

2019-08-07 Thread Weichen Xu
All right we could support both Python 2 and Python 3 for spark 3.0. On Wed, Aug 7, 2019 at 6:10 PM Hyukjin Kwon wrote: > We didn't drop Python 2 yet although it's deprecated. So I think It should > support both Python 2 and Python 3 at the current status. > > 2019년 8월 7일 (수) 오후 6

[DISCUSS] Migrate development scripts under dev/ from Python2 to Python 3

2019-08-07 Thread Weichen Xu
Hi all, I would like to discuss the compatibility for dev scripts. Because we already decided to deprecate python2 in spark 3.0, for development scripts under dev/ , we have two choice: 1) Migration from Python 2 to Python 3 2) Support both Python 2 and Python 3 I tend to option (2) which is

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Weichen Xu
+1, nice feature! On Sat, Mar 2, 2019 at 6:11 AM Yinan Li wrote: > +1 > > On Fri, Mar 1, 2019 at 12:37 PM Tom Graves > wrote: > >> +1 for the SPIP. >> >> Tom >> >> On Friday, March 1, 2019, 8:14:43 AM CST, Xingbo Jiang < >> jiangxb1...@gmail.com> wrote: >> >> >> Hi all, >> >> I want to call

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-20 Thread Weichen Xu
We need to merge this. https://github.com/apache/spark/pull/22492 Otherwise mleap cannot build against spark 2.4.0 Thanks! On Wed, Sep 19, 2018 at 1:16 PM Yinan Li wrote: > FYI: SPARK-23200 has been resolved. > > On Tue, Sep 18, 2018 at 8:49 AM Felix Cheung > wrote: > >> If we could work on

Re: [VOTE] [SPARK-24374] SPIP: Support Barrier Scheduling in Apache Spark

2018-06-03 Thread Weichen Xu
+1 On Fri, Jun 1, 2018 at 3:41 PM, Xiao Li wrote: > +1 > > 2018-06-01 15:41 GMT-07:00 Xingbo Jiang : > >> +1 >> >> 2018-06-01 9:21 GMT-07:00 Xiangrui Meng : >> >>> Hi all, >>> >>> I want to call for a vote of SPARK-24374 >>> . It introduces a

Re: [MLLib] Logistic Regression and standadization

2018-04-20 Thread Weichen Xu
gt; > Valeriy. > > On 04/17/2018 11:40 AM, Weichen Xu wrote: > > Not a bug. > > When disabling standadization, mllib LR will still do standadization for > features, but it will scale the coefficients back at the end (after > training finished). So it will get the same

Re: [MLLib] Logistic Regression and standadization

2018-04-17 Thread Weichen Xu
Not a bug. When disabling standadization, mllib LR will still do standadization for features, but it will scale the coefficients back at the end (after training finished). So it will get the same result with no standadization training. The purpose of it is to improve the rate of convergence. So

Re: 回复: Welcome Zhenhua Wang as a Spark committer

2018-04-02 Thread Weichen Xu
Congrats Zhenhua! On Mon, Apr 2, 2018 at 5:32 PM, Gengliang wrote: > Congrats, Zhenhua! > > > > On Mon, Apr 2, 2018 at 5:19 PM, Marco Gaido > wrote: > >> Congrats Zhenhua! >> >> 2018-04-02 11:00 GMT+02:00 Saisai Shao : >> >>>

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-23 Thread Weichen Xu
+1 On Fri, Feb 23, 2018 at 5:40 PM, Gengliang wrote: > +1 > > On Fri, Feb 23, 2018 at 11:35 AM, Xingbo Jiang > wrote: > >> +1 >> >> 2018-02-23 11:26 GMT+08:00 Takuya UESHIN : >> >>> +1 >>> >>> On Fri, Feb 23, 2018 at 12:24 PM,

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Weichen Xu
+1 On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin wrote: > Done, thanks! > > On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal > wrote: > > Sure, please feel free to backport. > > > > On 20 February 2018 at 18:02, Marcelo Vanzin >

Re: I Want to Help with MLlib Migration

2018-02-16 Thread Weichen Xu
>>The goal is to have these algorithms implemented using the Dataset API. Currently, the implementation of these classes/algorithms uses RDDs by wrapping the old (mllib) classes, which will eventually be deprecated (and deleted). It need discussion and test for each algorithm before doing that.

Re: Hinge Gradient

2017-12-16 Thread Weichen Xu
Hi Deb, Which library or paper do you find to use this loss function in SVM ? But I prefer the implementation in LIBLINEAR which use coordinate descent optimizer. Thanks. On Sun, Dec 17, 2017 at 6:52 AM, Yanbo Liang wrote: > Hello Deb, > > To optimize non-smooth function

Re: [VOTE] Spark 2.2.1 (RC2)

2017-11-29 Thread Weichen Xu
+1 On Thu, Nov 30, 2017 at 6:27 AM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > +1 > > SHA, MD5 and signatures look fine. Built and ran Maven tests on my Macbook. > > Thanks > Shivaram > > On Wed, Nov 29, 2017 at 10:43 AM, Holden Karau > wrote: > >> +1

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-03 Thread Weichen Xu
+1. On Sat, Nov 4, 2017 at 8:04 AM, Matei Zaharia wrote: > +1 from me too. > > Matei > > > On Nov 3, 2017, at 4:59 PM, Wenchen Fan wrote: > > > > +1. > > > > I think this architecture makes a lot of sense to let executors talk to > source/sink

Re: [VOTE][SPIP] SPARK-22026 data source v2 write path

2017-10-11 Thread Weichen Xu
+1 On Thu, Oct 12, 2017 at 10:36 AM, Xiao Li wrote: > +1 > > Xiao > > On Mon, 9 Oct 2017 at 7:31 PM Reynold Xin wrote: > >> +1 >> >> One thing with MetadataSupport - It's a bad idea to call it that unless >> adding new functions in that trait wouldn't

Re: Welcoming Tejas Patil as a Spark committer

2017-09-30 Thread Weichen Xu
Congratulations Tejas ! On Sat, Sep 30, 2017 at 4:05 PM, Liang-Chi Hsieh wrote: > > Congrats! > > > Matei Zaharia wrote > > Hi all, > > > > The Spark PMC recently added Tejas Patil as a committer on the > > project. Tejas has been contributing across several areas of Spark for