[ANNOUNCE] Announcing Apache Spark 2.2.3

2019-01-14 Thread Dongjoon Hyun
We are happy to announce the availability of Spark 2.2.3! Apache Spark 2.2.3 is a maintenance release, based on the branch-2.2 maintenance branch of Spark. We strongly recommend all 2.2.x users to upgrade to this stable release. To download Spark 2.2.3, head over to the download page:

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread Jungtaek Lim
Yes I understand what Reynold stated (as Michael Armbrust stated earlier), and I agree it's major great thing that improvements on CORE/SQL also benefit to SS as well. I just concerned that both of SQL / SS are being impacted with DSv2, but things are going differently between SQL and SS. SQL is

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread JackyLee
Agree with rxin. Maybe we should consider about these PRs, especially those large PRs, after DataSource V2 API is ready. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail:

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread Nicholas Chammas
OK, good to know, and that all makes sense. Thanks for clearing up my concern. One of great things about Spark is, as you pointed out, that improvements to core components benefit multiple features at once. On Mon, Jan 14, 2019 at 8:36 PM Reynold Xin wrote: > BTW the largest change to SS right

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread Reynold Xin
BTW the largest change to SS right now is probably the entire data source API v2 effort, which aims to unify streaming and batch from data source perspective, and provide a reliable, expressive source/sink API. On Mon, Jan 14, 2019 at 5:34 PM, Reynold Xin < r...@databricks.com > wrote: > >

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread Reynold Xin
There are a few things to keep in mind: 1. Structured Streaming isn't an independent project. It actually (by design) depends on all the rest of Spark SQL, and virtually all improvements to Spark SQL benefit Structured Streaming. 2. The project as far as I can tell is relatively mature for

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread Nicholas Chammas
As an observer, this thread is interesting and concerning. Is there an emerging consensus that Structured Streaming is somehow not relevant anymore? Or is it just that folks consider it "complete enough"? Structured Streaming was billed as the replacement to DStreams. If committers, generally

[DISCUSS] SPIP SPARK-26257

2019-01-14 Thread tcondie
Dear Spark Community, I have posted a SPIP to JIRA: https://issues.apache.org/jira/browse/SPARK-26257 I look forward to your feedback on the JIRA ticket. Best regards, Tyson

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread Jungtaek Lim
Cody, I guess I already addressed your comments in the PR (#22138). The approach was changed to address your concern, and after that Gabor helped to review the PR. Please take a look again when you have time to get into. 2019년 1월 15일 (화) 오전 1:01, Cody Koeninger 님이 작성: > I feel like I've already

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread Jungtaek Lim
Sad to hear that. While I understand such thing can be happened for any project, it feels me to a kind of bad sign that non-experimental major feature which has no alternative is getting lost on interest. I also fully agree that there isn't a way to make people work on it (I also had encountered

Re: [build system] jenkins mildly wedged, needs a quick restart

2019-01-14 Thread shane knapp
alright, everything seems to be working as expected! :) On Mon, Jan 14, 2019 at 11:07 AM shane knapp wrote: > we're back up and building... things still seem a little flaky so i'll be > investigating a little bit deeper in to what's doing on. > > On Mon, Jan 14, 2019 at 10:55 AM shane knapp

Re: [build system] jenkins mildly wedged, needs a quick restart

2019-01-14 Thread shane knapp
we're back up and building... things still seem a little flaky so i'll be investigating a little bit deeper in to what's doing on. On Mon, Jan 14, 2019 at 10:55 AM shane knapp wrote: > this will kill a bunch of PRB builds, so i'll go and retrigger them once > jenkins is back up. > > -- > Shane

[build system] jenkins mildly wedged, needs a quick restart

2019-01-14 Thread shane knapp
this will kill a bunch of PRB builds, so i'll go and retrigger them once jenkins is back up. -- Shane Knapp UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu

Re: [mllib] Document frequency

2019-01-14 Thread Jatin Puri
Thanks. Created: https://issues.apache.org/jira/browse/SPARK-26616 On Mon, Jan 14, 2019 at 9:19 PM Sean Owen wrote: > Yes that seems OK to me. > > On Mon, Jan 14, 2019 at 9:40 AM Jatin Puri wrote: > > > > Thanks for the response. So do I go ahead and create a jira ticket? > > Can then send a

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread Cody Koeninger
I feel like I've already said my piece on https://github.com/apache/spark/pull/22138 let me know if you have more questions. As for SS in general, I don't have a production SS deployment, so I'm less comfortable with reviewing large changes to it. But if no other committers are working on it...

Re: [mllib] Document frequency

2019-01-14 Thread Sean Owen
Yes that seems OK to me. On Mon, Jan 14, 2019 at 9:40 AM Jatin Puri wrote: > > Thanks for the response. So do I go ahead and create a jira ticket? > Can then send a pull request for the same with the changes. > > On Mon, Jan 14, 2019 at 8:18 PM Sean Owen wrote: >> >> I think that's reasonable.

Re: [mllib] Document frequency

2019-01-14 Thread Jatin Puri
Thanks for the response. So do I go ahead and create a jira ticket? Can then send a pull request for the same with the changes. On Mon, Jan 14, 2019 at 8:18 PM Sean Owen wrote: > I think that's reasonable. The caller probably has the number of docs > already but sure, it's one long and is

Re: [mllib] Document frequency

2019-01-14 Thread Sean Owen
I think that's reasonable. The caller probably has the number of docs already but sure, it's one long and is already computed. This would have to be added to Pyspark too. On Mon, Jan 14, 2019 at 7:56 AM Jatin Puri wrote: > > Hello. > > As part of `org.apache.spark.ml.feature.IDFModel`, I think

[mllib] Document frequency

2019-01-14 Thread Jatin Puri
Hello. As part of `org.apache.spark.ml.feature.IDFModel`, I think it is a good idea to also expose: 1. Document frequency vector 2. Number of documents We get the above for free currently and they just need to be exposed as public val. This avoids re-implementation for someone who needs to