Re: [DISCUSS] Adaptive execution in Spark SQL

2018-07-30 Thread Wenchen Fan
Hi Carson and Yuanjian, Thanks for contributing to this project and sharing the production use cases! I believe the adaptive execution will be a very important feature of Spark SQL and will definitely benefit a lot of users. I went through the design docs and the high-level design totally makes s

Re: Data source V2

2018-07-30 Thread Wenchen Fan
Hi assaf, Thanks for trying data source v2! Data source v2 is still evolving(we marked all the data source v2 interface as @Evolving), and we've already made a lot of API changes in this release(some renaming, switching to InternalRow, etc.). So I'd not encourage people to use data source v2 in lo

Re: [DISCUSS] Adaptive execution in Spark SQL

2018-07-30 Thread Yuanjian Li
Thanks Carson, great note! Actually Baidu has ported this patch in our internal folk. I collected some user cases and performance improve effect during Baidu internal usage of this patch, summarize as following 3 scenario: 1. SortMergeJoin to BroadcastJoin The SortMergeJoin transform to BroadcastJo

Re: Review notification bot

2018-07-30 Thread Holden Karau
The activeness is a thing that came up in the Beam project POC I'm doing for the same bot (filtered it down to contributors active in the last year only). On Mon, Jul 30, 2018 at 11:08 PM, Jungtaek Lim wrote: > Sorry to chime in, just 2 cents on this since it looks like interesting > topic. > >

Re: Review notification bot

2018-07-30 Thread Jungtaek Lim
Sorry to chime in, just 2 cents on this since it looks like interesting topic. Just to share my habit as a one of contributors (for various projects), I don't take "git history" or "git blame" to find authors of file and ping for review. I just ping for active committers who recently merged the pu

Data source V2

2018-07-30 Thread assaf.mendelson
Hi all, I am currently in the middle of developing a new data source (for an internal tool) using data source V2. I noticed that SPARK-24882 is planned for 2.4 and includes interface changes. I was wondering if those are planned in addition to

Re: Review notification bot

2018-07-30 Thread Holden Karau
Another thing we could try and do (if folks would be down to try) is it have not actually ping, but suggest the potential usernames to ping to the user (e.g. say suggested reviewers you _may wish to ping_ and then list)? On Mon, Jul 30, 2018 at 10:45 PM, Holden Karau wrote: > > On Mon, Jul 30, 2

Re: Review notification bot

2018-07-30 Thread Holden Karau
On Mon, Jul 30, 2018 at 10:22 PM, Reynold Xin wrote: > I like the idea of this bot, but I'm somewhat annoyed by it. I have > touched a lot of files and wrote a lot of the original code. Everyday I > wake up I get a lot of emails from this bot. > We could blacklist the existing PMC (or add a rate

Re: Review notification bot

2018-07-30 Thread Reynold Xin
I like the idea of this bot, but I'm somewhat annoyed by it. I have touched a lot of files and wrote a lot of the original code. Everyday I wake up I get a lot of emails from this bot. Also if we are going to use this, can we rename the bot to something like spark-bot, rather than holden's persona

Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
> That being said the folks being pinged are not just committers. I doubt it because only pinged ones I see are all committers and that's why I assumed the pinging is based on who committed the PR (which implies committer only). Do you maybe have some examples where non-committers were pinged? Loo

Re: Review notification bot

2018-07-30 Thread Holden Karau
So CODEOWNERS is limited to committers by GitHub. We can definitely modify the config file though and I'm happy to write some custom logic if it helps support our needs. We can also just turn it off if it's too noisey for folks in general. That being said the folks being pinged are not just commit

Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
*reviewers: I mean people who committed the PR given my observation. 2018년 7월 31일 (화) 오전 11:50, Hyukjin Kwon 님이 작성: > I was wondering if we can leave the configuration open and accept some > custom configurations, IMHO, because I saw some people less related or less > active are consistently ping

Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
I was wondering if we can leave the configuration open and accept some custom configurations, IMHO, because I saw some people less related or less active are consistently pinged. Just started to get worried if they get annoyed by this. Also, some people could be interested in few specific areas. Th

Re: Review notification bot

2018-07-30 Thread Holden Karau
Th configuration file is optional, is there something you want to try and change? On Mon, Jul 30, 2018 at 7:30 PM Hyukjin Kwon wrote: > I see. Thanks. I was wondering if I can see the configuration file since > that looks needed (https://github.com/holdenk/mention-bot#configuration) > but I coul

Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
I see. Thanks. I was wondering if I can see the configuration file since that looks needed (https://github.com/holdenk/mention-bot#configuration) but I couldn't find (sorry if it's just something I simply missed). 2018년 7월 31일 (화) 오전 1:48, Holden Karau 님이 작성: > So the one that is running is the t

Re: [Spark SQL] Future of CalendarInterval

2018-07-30 Thread Hyukjin Kwon
FYI, org.apache.spark.unsafe.types.CalendarInterval is undocumented in both scaladoc/javadoc (entire unsafe module) but org.apache.spark.sql.types.CalendarIntervalType is exposed ( https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.CalendarIntervalType ) +1 for st

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-30 Thread Wenchen Fan
I went through the open JIRA tickets and here is a list that we should consider for Spark 2.4: *High Priority*: SPARK-24374 : Support Barrier Execution Mode in Apache Spark This one is critical to the Spark ecosystem for deep learning. It only has

[build system] two workers will be reimaged w/ubuntu tomorrow

2018-07-30 Thread shane knapp
my testing is going really well, and i think we're --->this<--- close to porting all of the spark builds to ubuntu! TL;DR: i am NOT planning on moving all builds to centos until after august 8th. i WOULD like to move the PRB to ubuntu before then. anyways: once these two smoke test builds pas

Re: Review notification bot

2018-07-30 Thread Holden Karau
So the one that is running is the the form in my own repo (set up for K8s deployment) - http://github.com/holdenk/mention-bot On Mon, Jul 30, 2018 at 3:15 AM Hyukjin Kwon wrote: > Holden, so, is it a fork in https://github.com/facebookarchive/mention-bot? > Would you mind if I ask where I can se

Re: Why percentile and distinct are not done in one job?

2018-07-30 Thread Reynold Xin
Which API are you talking about? On Mon, Jul 30, 2018 at 7:03 AM 吴晓菊 wrote: > I noticed that in column analyzing, 2 jobs will run separately to > calculate percentiles and then distinct. Why not combine into one job since > HyperLogLog also supports merge? > > Chrysan Wu > Phone:+86 17717640807

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-30 Thread Sean Owen
In theory releases happen on a time-based cadence, so it's pretty much wrap up what's ready by the code freeze and ship it. In practice, the cadence slips frequently, and it's very much a negotiation about what features should push the code freeze out a few weeks every time. So, kind of a hybrid ap

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-30 Thread Tom Graves
Shouldn't this be a discuss thread?   I'm also happy to see more release managers and agree the time is getting close, but we should see what features are in progress and see how close things are and propose a date based on that.  Cutting a branch to soon just creates more work for committers t

Why percentile and distinct are not done in one job?

2018-07-30 Thread 吴晓菊
I noticed that in column analyzing, 2 jobs will run separately to calculate percentiles and then distinct. Why not combine into one job since HyperLogLog also supports merge? Chrysan Wu Phone:+86 17717640807

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-30 Thread Wenchen Fan
Another two correctness bug fixes were merged to 2.3 today: https://issues.apache.org/jira/browse/SPARK-24934 https://issues.apache.org/jira/browse/SPARK-24957 On Mon, Jul 30, 2018 at 1:19 PM Xiao Li wrote: > Sounds good to me. Thanks! Today, we merged another correctness fix > https://github.co

Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
Holden, so, is it a fork in https://github.com/facebookarchive/mention-bot? Would you mind if I ask where I can see the configurations for it? 2018년 7월 23일 (월) 오전 10:16, Holden Karau 님이 작성: > Yeah so the issue with codeowners is it will only assign to committers on > the repo (the Beam project f