Re: Packages to release in 3.0.0-preview

2019-10-31 Thread Cody Koeninger
On Thu, Oct 31, 2019 at 4:30 PM Sean Owen wrote: > > . But it'd be cooler to call these major > releases! Maybe this is just semantics, but my point is the Scala project already does call 2.12 to 2.13 a major release e.g. from https://www.scala-lang.org/download/ "Note that different *major* r

Re: Packages to release in 3.0.0-preview

2019-10-31 Thread Cody Koeninger
On Wed, Oct 30, 2019 at 5:57 PM Sean Owen wrote: > Or, frankly, maybe Scala should reconsider the mutual incompatibility > between minor releases. These are basically major releases, and > indeed, it causes exactly this kind of headache. > Not saying binary incompatibility is fun, but 2.12 to 2

Re: Structured streaming from Kafka by timestamp

2019-02-05 Thread Cody Koeninger
To be more explicit, the easiest thing to do in the short term is use your own instance of KafkaConsumer to get the offsets for the timestamps you're interested in, using offsetsForTimes, and use those for the start / end offsets. See https://kafka.apache.org/10/javadoc/?org/apache/kafka/clients/c

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread Cody Koeninger
I feel like I've already said my piece on https://github.com/apache/spark/pull/22138 let me know if you have more questions. As for SS in general, I don't have a production SS deployment, so I'm less comfortable with reviewing large changes to it. But if no other committers are working on it...

Re: Automated formatting

2018-11-26 Thread Cody Koeninger
, Nov 22, 2018 at 7:32 PM Matei Zaharia wrote: > > Can we start by just recommending to contributors that they do this manually? > Then if it seems to work fine, we can try to automate it. > > > On Nov 22, 2018, at 4:40 PM, Cody Koeninger wrote: > > > > I believe s

Re: Automated formatting

2018-11-22 Thread Cody Koeninger
On Thu, Nov 22, 2018 at 9:11 AM Cody Koeninger wrote: >> >> Plugin invocation is ./build/mvn mvn-scalafmt_2.12:format >> >> It takes about 5 seconds, and errors out on the first different file >> that doesn't match formatting. >> >> I made a shel

Re: Automated formatting

2018-11-22 Thread Cody Koeninger
ff, seems worth a shot. What's the invocation that Shane > could add (after this change goes in) > On Wed, Nov 21, 2018 at 3:27 PM Cody Koeninger wrote: > > > > There's a mvn plugin (sbt as well, but it requires sbt 1.0+) so it > > should be runnable from the PR builder

Re: Automated formatting

2018-11-21 Thread Cody Koeninger
ad strokes but not in the details. > Is this something that can be just run in the PR builder? if the rules > are simple and not too hard to maintain, seems like a win. > On Wed, Nov 21, 2018 at 2:26 PM Cody Koeninger wrote: > > > > Definitely not suggesting a mass reformat

Re: Automated formatting

2018-11-21 Thread Cody Koeninger
so it's inevitable. > > Is there a way to just check style on PR changes? that's fine. > On Wed, Nov 21, 2018 at 11:40 AM Cody Koeninger wrote: > > > > Is there any appetite for revisiting automating formatting? > > > > I know over the years various p

Automated formatting

2018-11-21 Thread Cody Koeninger
Is there any appetite for revisiting automating formatting? I know over the years various people have expressed opposition to it as unnecessary churn in diffs, but having every new contributor greeted with "nit: 4 space indentation for argument lists" isn't very welcoming. ---

Re: [Structured Streaming] Kafka group.id is fixed

2018-11-19 Thread Cody Koeninger
Anastasios it looks like you already identified the two lines that need to change, the string interpolation that depends on UUID.randomUUID and metadataPath.hashCode. I'd factor that out into a function that returns the group id. That function would also need to take the "parameters" variable (th

Re: DataSourceV2 sync tomorrow

2018-11-13 Thread Cody Koeninger
Am I the only one for whom the livestream link didn't work last time? Would like to be able to at least watch the discussion this time around. On Tue, Nov 13, 2018 at 6:01 PM Ryan Blue wrote: > > Hi everyone, > I just wanted to send out a reminder that there’s a DSv2 sync tomorrow at > 17:00 PST,

Re: [Structured Streaming] Kafka group.id is fixed

2018-11-09 Thread Cody Koeninger
That sounds reasonable to me On Fri, Nov 9, 2018 at 2:26 AM Anastasios Zouzias wrote: > > Hi all, > > I run in the following situation with Spark Structure Streaming (SS) using > Kafka. > > In a project that I work on, there is already a secured Kafka setup where ops > can issue an SSL certifica

Re: Nightly Builds in the docs (in spark-nightly/spark-master-bin/latest? Can't seem to find it)

2018-08-31 Thread Cody Koeninger
Just got a question about this on the user list as well. Worth removing that link to pwendell's directory from the docs? On Sun, Jan 21, 2018 at 12:13 PM, Jacek Laskowski wrote: > Hi, > > http://spark.apache.org/developer-tools.html#nightly-builds reads: > >> Spark nightly packages are available

Re: [discuss] replacing SPIP template with Heilmeier's Catechism?

2018-08-31 Thread Cody Koeninger
+1 to Sean's comment On Fri, Aug 31, 2018 at 2:48 PM, Reynold Xin wrote: > Yup all good points. One way I've done it in the past is to have an appendix > section for design sketch, as an expansion to the question "- What is new in > your approach and why do you think it will be successful?" > > O

Re: Migrating from kafka08 client to kafka010

2018-08-02 Thread Cody Koeninger
Short answer is it isn't necessary. Long answer is that you aren't just changing from 08 to 10, you're changing from the receiver based implementation to the direct stream. Read these: https://github.com/koeninger/kafka-exactly-once http://spark.apache.org/docs/latest/streaming-kafka-0-8-integrat

Re: [DISCUSS] SPIP: Standardize SQL logical plans

2018-07-17 Thread Cody Koeninger
According to http://spark.apache.org/improvement-proposals.html the shepherd should be a PMC member, not necessarily the person who proposed the SPIP On Tue, Jul 17, 2018 at 9:13 AM, Wenchen Fan wrote: > I don't know an official answer, but conventionally people who propose the > SPIP would cal

Re: Time for 2.3.1?

2018-05-11 Thread Cody Koeninger
Sounds good, I'd like to add SPARK-24067 today assuming there's no objections On Thu, May 10, 2018 at 1:22 PM, Henry Robinson wrote: > +1, I'd like to get a release out with SPARK-23852 fixed. The Parquet > community are about to release 1.8.3 - the voting period closes tomorrow - > and I've test

Process for backports?

2018-04-24 Thread Cody Koeninger
https://issues.apache.org/jira/browse/SPARK-24067 is asking to backport a change to the 2.3 branch. My questions - In general are there any concerns about what qualifies for backporting? This adds a configuration variable but shouldn't change default behavior. - Is a separate jira + pr actuall

Re: Welcome Zhenhua Wang as a Spark committer

2018-04-02 Thread Cody Koeninger
Congrats! On Mon, Apr 2, 2018 at 12:28 AM, Wenchen Fan wrote: > Hi all, > > The Spark PMC recently added Zhenhua Wang as a committer on the project. > Zhenhua is the major contributor of the CBO project, and has been > contributing across several areas of Spark for a while, focusing especially >

Re: Welcoming some new committers

2018-03-02 Thread Cody Koeninger
er past work: > > - Anirudh Ramanathan (contributor to Kubernetes support) > - Bryan Cutler (contributor to PySpark and Arrow support) > - Cody Koeninger (contributor to streaming and Kafka support) > - Erik Erlandson (contributor to Kubernetes support) > - Matt Cheah (contributor to Kube

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-01 Thread Cody Koeninger
Was there any answer to my question around the effect of changes to the sink api regarding access to underlying offsets? On Wed, Nov 1, 2017 at 11:32 AM, Reynold Xin wrote: > Most of those should be answered by the attached design sketch in the JIRA > ticket. > > On Wed, Nov 1, 2017 at 5:29 PM De

Re: Spark Kafka API tries connecting to dead node for every batch, which increases the processing time

2017-10-16 Thread Cody Koeninger
23 AM, Suprith T Jain wrote: > Yes I tried that. But it's not that effective. > > In fact kafka SimpleConsumer tries to reconnect in case of socket error > (sendRequest method). So it ll always be twice the timeout for every window > and for every node that is down. > >

Re: Spark Kafka API tries connecting to dead node for every batch, which increases the processing time

2017-10-16 Thread Cody Koeninger
Have you tried adjusting the timeout? On Mon, Oct 16, 2017 at 8:08 AM, Suprith T Jain wrote: > Hi guys, > > I have a 3 node cluster and i am running a spark streaming job. consider the > below example > > /*spark-submit* --master yarn-cluster --class > com.huawei.bigdata.spark.examples.FemaleInfo

Re: Easy way to get offset metatada with Spark Streaming API

2017-09-11 Thread Cody Koeninger
https://issues-test.apache.org/jira/browse/SPARK-18258 On Mon, Sep 11, 2017 at 7:15 AM, Dmitry Naumenko wrote: > Hi all, > > It started as a discussion in > https://stackoverflow.com/questions/46153105/how-to-get-kafka-offsets-with-spark-structured-streaming-api. > > So the problem that there is

Re: Putting Kafka 0.8 behind an (opt-in) profile

2017-09-06 Thread Cody Koeninger
Is the Kafka 0.10 integration as stable as it is going to be, and worth > marking as such for 2.3.0? > > > On Tue, Sep 5, 2017 at 4:12 PM Cody Koeninger wrote: >> >> +1 to going ahead and giving a deprecation warning now >> >> On Tue, Sep 5, 2017 at 6:39 AM, Sean Ow

Re: Putting Kafka 0.8 behind an (opt-in) profile

2017-09-05 Thread Cody Koeninger
+1 to going ahead and giving a deprecation warning now On Tue, Sep 5, 2017 at 6:39 AM, Sean Owen wrote: > On the road to Scala 2.12, we'll need to make Kafka 0.8 support optional in > the build, because it is not available for Scala 2.12. > > https://github.com/apache/spark/pull/19134 adds that

Re: Spark streaming Kafka 0.11 integration

2017-09-05 Thread Cody Koeninger
Here's the jira for upgrading to a 0.10.x point release, which is effectively the discussion of upgrading to 0.11 now https://issues.apache.org/jira/browse/SPARK-18057 On Tue, Sep 5, 2017 at 1:27 AM, matus.cimerman wrote: > Hi guys, > > is there any plans to support Kafka 0.11 integration for Sp

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-28 Thread Cody Koeninger
Just wanted to point out that because the jira isn't labeled SPIP, it won't have shown up linked from http://spark.apache.org/improvement-proposals.html On Mon, Aug 28, 2017 at 2:20 PM, Wenchen Fan wrote: > Hi all, > > It has been almost 2 weeks since I proposed the data source V2 for > discussi

Re: SPARK-19547

2017-06-08 Thread Cody Koeninger
Can you explain in more detail what you mean by "distribute Kafka topics among different instances of same consumer group"? If you're trying to run multiple streams using the same consumer group, it's already documented that you shouldn't do that. On Thu, Jun 8, 2017 at 12:43 AM, Rastogi, Pankaj

Re: [KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

2017-05-01 Thread Cody Koeninger
017 7:26 p.m., "Michael Armbrust" wrote: >> >> He's just suggesting that since the DataStreamWriter start() method can >> fill in an option named "path", we should make that a synonym for "topic". >> Then you could do something like. >> >>

Re: [KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

2017-05-01 Thread Cody Koeninger
I'm confused about what you're suggesting. Are you saying that a Kafka sink should take a filesystem path as an option? On Mon, May 1, 2017 at 8:52 AM, Jacek Laskowski wrote: > Hi, > > I've just found out that KafkaSourceProvider supports topic option > that sets the Kafka topic to save a DataFr

Re: Question about upgrading Kafka client version

2017-03-10 Thread Cody Koeninger
There are existing tickets on the issues around kafka versions, e.g. https://issues.apache.org/jira/browse/SPARK-18057 that haven't gotten any committer weigh-in on direction. On Thu, Mar 9, 2017 at 12:52 PM, Oscar Batori wrote: > Guys, > > To change the subject from meta-voting... > > We are doi

Re: Spark Improvement Proposals

2017-03-10 Thread Cody Koeninger
pen ticket with the SPIP label show it should show up On Fri, Mar 10, 2017 at 11:19 AM, Reynold Xin wrote: > We can just start using spip label and link to it. > > > > On Fri, Mar 10, 2017 at 9:18 AM, Cody Koeninger wrote: >> >> So to be clear, if I translate that go

Re: Spark Improvement Proposals

2017-03-10 Thread Cody Koeninger
ins > can make a new issue type unfortunately. We may just have to mention a > convention involving title and label or something. > > On Fri, Mar 10, 2017 at 4:52 PM Cody Koeninger wrote: >> >> I think it ought to be its own page, linked from the more / community >> menu dr

Re: Spark Improvement Proposals

2017-03-10 Thread Cody Koeninger
I think it ought to be its own page, linked from the more / community menu dropdowns. We also need the jira tag, and for the page to clearly link to filters that show proposed / completed SPIPs On Fri, Mar 10, 2017 at 3:39 AM, Sean Owen wrote: > Alrighty, if nobody is objecting, and nobody calls

Re: Spark Improvement Proposals

2017-03-09 Thread Cody Koeninger
;s a code/doc > change we can just review and merge as usual. > > On Tue, Mar 7, 2017 at 3:15 PM Cody Koeninger wrote: >> >> Another week, another ping. Anyone on the PMC willing to call a vote on >> this? - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Spark Improvement Proposals

2017-03-07 Thread Cody Koeninger
> rb > > On Fri, Feb 24, 2017 at 8:28 PM, Joseph Bradley > wrote: > >> The current draft LGTM. I agree some of the various concerns may need to >> be addressed in the future, depending on how SPIPs progress in practice. >> If others agree, let's put it t

Re: Spark Improvement Proposals

2017-02-24 Thread Cody Koeninger
oc looks good to me. >>> >>> Ryan, the role of the shepherd is to make sure that someone >>> knowledgeable with Spark processes is involved: this person can advise >>> on technical and procedural considerations for people outside the >>> community. Also, if

Re: Spark Improvement Proposals

2017-02-16 Thread Cody Koeninger
ng what SPIP implies. It's just a process > document. > > Still, a fine step IMHO. > > On Thu, Feb 16, 2017 at 4:22 PM Reynold Xin wrote: >> >> Updated. Any feedback from other community members? >> >> >> On Wed, Feb 15, 2017 at 2:53 AM, Cody Koe

Re: Spark Improvement Proposals

2017-02-14 Thread Cody Koeninger
ure list was always above 100. Sometimes, the >> customers are feeling frustrated when we are unable to deliver them on time >> due to the resource limits and others. Even if they paid us billions, we >> still need to do it phase by phase or sometimes they have to accept the >>

Re: Spark Improvement Proposals

2017-02-11 Thread Cody Koeninger
;> up with a distracting long tail of half-hearted proposals. >> >> These rules are meant to be flexible, but the current document should be >> clear about who is in charge of a SPIP, and the state it is currently in. >> >> We have had long discussions over some very imp

Re: welcoming Burak and Holden as committers

2017-01-24 Thread Cody Koeninger
Congrats, glad to hear it On Jan 24, 2017 12:47 PM, "Shixiong(Ryan) Zhu" wrote: > Congrats Burak & Holden! > > On Tue, Jan 24, 2017 at 10:39 AM, Joseph Bradley > wrote: > >> Congratulations Burak & Holden! >> >> On Tue, Jan 24, 2017 at 10:33 AM, Dongjoon Hyun >> wrote: >> >>> Great! Congratula

Re: Feedback on MLlib roadmap process proposal

2017-01-24 Thread Cody Koeninger
Totally agree with most of what Sean said, just wanted to give an alternate take on the "maintainers" thing On Tue, Jan 24, 2017 at 10:23 AM, Sean Owen wrote: > There is no such list because there's no formal notion of ownership or > access to subsets of the project. Tracking an informal notion w

Re: Spark Improvement Proposals

2017-01-03 Thread Cody Koeninger
mentioned above + a new one >> w.r.t. Reynold's draft >> <https://docs.google.com/document/d/1-Zdi_W-wtuxS9hTK0P9qb2x-nRanvXmnZ7SUi4qMljg/edit#> >> : >> * Reinstate the "Where" section with links to current and past SIPs >> * Add field for stating

Re: Spark Improvement Proposals

2017-01-02 Thread Cody Koeninger
requirement for three +1 votes. Why > would we not want at least three committers to think something is a good > idea before adopting the proposal? > > rb > > On Tue, Nov 8, 2016 at 8:13 AM, Cody Koeninger wrote: >> >> So there are some minor things (the Where sec

Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-09 Thread Cody Koeninger
Agree that frequent topic deletion is not a very Kafka-esque thing to do On Fri, Dec 9, 2016 at 12:09 PM, Shixiong(Ryan) Zhu wrote: > Sean, "stress test for failOnDataLoss=false" is because Kafka consumer may > be thrown NPE when a topic is deleted. I added some logic to retry on such > failure,

Re: Back-pressure to Spark Kafka Streaming?

2016-12-05 Thread Cody Koeninger
If you want finer-grained max rate setting, SPARK-17510 got merged a while ago. There's also SPARK-18580 which might help address the issue of starting backpressure rate for the first batch. On Mon, Dec 5, 2016 at 4:18 PM, Liren Ding wrote: > Hey all, > > Does backressure actually work on spark

Re: using Spark Streaming with Kafka 0.9/0.10

2016-11-15 Thread Cody Koeninger
s that can generate > RDDs from new data by running a service/thread only on the driver node (that > is, without running a receiver on worker nodes) > > Thanks and regards, > Aakash Pradeep > > > On Tue, Nov 15, 2016 at 2:55 PM, Cody Koeninger wrote: >> >> It'

Re: using Spark Streaming with Kafka 0.9/0.10

2016-11-15 Thread Cody Koeninger
It'd probably be worth no longer marking the 0.8 interface as experimental. I don't think it's likely to be subject to active development at this point. You can use the 0.8 artifact to consume from a 0.9 broker Where are you reading documentation indicating that the direct stream only runs on th

Re: Connectors using new Kafka consumer API

2016-11-09 Thread Cody Koeninger
; I think they are open to others helping, in fact, more than one person has > worked on the JIRA so far. And, it's been crawling really slowly and that's > preventing adoption of Spark's new connector in secure Kafka environments. > > On Tue, Nov 8, 2016 at 7:59 PM, Cod

Re: Connectors using new Kafka consumer API

2016-11-08 Thread Cody Koeninger
Have you asked the assignee on the Kafka jira whether they'd be willing to accept help on it? On Tue, Nov 8, 2016 at 5:26 PM, Mark Grover wrote: > Hi all, > We currently have a new direct stream connector, thanks to work by Cody and > others on SPARK-12177. > > However, that can't be used in secu

Re: Spark Improvement Proposals

2016-11-08 Thread Cody Koeninger
t. >> >> >> On Monday, November 7, 2016, Cody Koeninger wrote: >>> >>> Thanks for picking up on this. >>> >>> Maybe I fail at google docs, but I can't see any edits on the document >>> you linked. >>> >>> Regarding la

Re: Odp.: Spark Improvement Proposals

2016-11-07 Thread Cody Koeninger
anzin >> wrote: >>> >>> The proposal looks OK to me. I assume, even though it's not explicitly >>> called, that voting would happen by e-mail? A template for the >>> proposal document (instead of just a bullet nice) would also be nice, >>>

Anyone want to weigh in on a Kafka DStreams api change?

2016-11-04 Thread Cody Koeninger
SPARK-17510 https://github.com/apache/spark/pull/15132 It's for allowing tweaking of rate limiting on a per-partition basis - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Continuous warning while consuming using new kafka-spark010 API

2016-11-04 Thread Cody Koeninger
I answered the duplicate post on the user mailing list, I'd say keep the discussion there. On Fri, Nov 4, 2016 at 12:14 PM, vonnagy wrote: > Nitin, > > I am getting the similar issues using Spark 2.0.1 and Kafka 0.10. I have to > jobs, one that uses a Kafka stream and one that uses just the Kafka

Re: Handling questions in the mailing lists

2016-11-02 Thread Cody Koeninger
So concrete things people could do - users could tag subject lines appropriately to the component they're asking about - contributors could monitor user@ for tags relating to components they've worked on. I'd be surprised if my miss rate for any mailing list questions well-labeled as Kafka was hi

Re: JIRA Components for Streaming

2016-10-31 Thread Cody Koeninger
Makes sense to me. I do wonder if e.g. [SPARK-12345][STRUCTUREDSTREAMING][KAFKA] is going to leave any room in the Github PR form for actual title content? On Mon, Oct 31, 2016 at 1:37 PM, Michael Armbrust wrote: > I'm planning to do a little maintenance on JIRA to hopefully improve the > visi

Re: Odp.: Spark Improvement Proposals

2016-10-31 Thread Cody Koeninger
y mail was just to show some > aspects from my side, so from theside of developer and person who is trying > to help others with Spark (via StackOverflow or other ways) > > > Pozdrawiam / Best regards, > > Tomasz > > > > Od: Cody Koeninger >

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-25 Thread Cody Koeninger
I think only supporting 1 version of scala at any given time is not sufficient, 2 probably is ok. I.e. don't drop 2.10 before 2.12 is out + supported On Tue, Oct 25, 2016 at 10:56 AM, Sean Owen wrote: > The general forces are that new versions of things to support emerge, and > are valuable to s

Re: [PSA] TaskContext.partitionId != the actual logical partition index

2016-10-20 Thread Cody Koeninger
think that makes sense, I can start a ticket. On Thu, Oct 20, 2016 at 1:16 PM, Reynold Xin wrote: > Seems like a good new API to add? > > > On Thu, Oct 20, 2016 at 11:14 AM, Cody Koeninger wrote: >> >> Access to the partition ID is necessary for basically every single one

Re: [PSA] TaskContext.partitionId != the actual logical partition index

2016-10-20 Thread Cody Koeninger
Access to the partition ID is necessary for basically every single one of my jobs, and there isn't a foreachPartiionWithIndex equivalent. You can kind of work around it with empty foreach after the map, but it's really awkward to explain to people. On Thu, Oct 20, 2016 at 12:52 PM, Reynold Xin wr

Re: StructuredStreaming status

2016-10-19 Thread Cody Koeninger
n latency ? >> > I think that the fact that they serve as an output trigger is a problem, > but Structured Streaming seems to resolve this now. > >> >> Thanks >> Shivaram >> >> On Wed, Oct 19, 2016 at 1:29 PM, Michael Armbrust >> wrote: >>

Re: StructuredStreaming status

2016-10-19 Thread Cody Koeninger
Is anyone seriously thinking about alternatives to microbatches? On Wed, Oct 19, 2016 at 2:45 PM, Michael Armbrust wrote: > Anything that is actively being designed should be in JIRA, and it seems > like you found most of it. In general, release windows can be found on the > wiki. > > 2.1 has a

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-18 Thread Cody Koeninger
+1 to putting docs in one clear place. On Oct 18, 2016 6:40 AM, "Sean Owen" wrote: > I'm OK with that. The upside to the wiki is that it can be edited directly > outside of a release cycle. However, in practice I find that the wiki is > rarely changed. To me it also serves as a place for informa

Re: cutting 2.0.2?

2016-10-17 Thread Cody Koeninger
SPARK-17841 three line bugfix that has a week old PR SPARK-17812 being able to specify starting offsets is a must have for a Kafka mvp in my opinion, already has a PR SPARK-17813 I can put in a PR for this tonight if it'll be considered On Mon, Oct 17, 2016 at 12:28 AM, Reynold Xin wrote: > Si

Re: Spark Improvement Proposals

2016-10-17 Thread Cody Koeninger
for SIP. However I think that Spark should >> have real-time streaming support. Currently I see many posts/comments >> that "Spark has too big latency". Spark Streaming is doing very good >> jobs with micro-batches, however I think it is possible to add also more >> r

Re: StructuredStreaming Custom Sinks (motivated by Structured Streaming Machine Learning)

2016-10-13 Thread Cody Koeninger
I've always been confused as to why it would ever be a good idea to put any streaming query system on the critical path for synchronous < 100msec requests. It seems to make a lot more sense to have a streaming system do asynch updates of a store that has better latency and quality of service char

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
we have run into some trouble in the past > with some inside the ASF but essentially outside the Spark community who > didn't like the way we were doing things. > > On Mon, Oct 10, 2016 at 3:53 PM, Cody Koeninger wrote: >> >> Apache documents say lots of confusing stuf

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
it confusing and can reduce contributions. >> Although, as engineers, we believe that anything can be solved using >> mechanical rules, in practice software development is a social process that >> ultimately requires humans to tackle things on a case-by-case basis. >> >&

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
nd I wouldn't want to move forward if up to half of the > community thinks it's an untenable idea. > > rb > > On Mon, Oct 10, 2016 at 12:07 PM, Cody Koeninger wrote: >> >> I think this is closer to a procedural issue than a code modification >> issue, henc

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
ess? I >> think restricting who can submit proposals would only undermine them by >> pushing contributors out. Maybe I'm missing something here? >> >> rb >> >> >> >> On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger >> wrote: >>> >&g

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
submit proposals would only undermine them by > pushing contributors out. Maybe I'm missing something here? > > rb > > > > On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger wrote: >> >> Yes, users suggesting SIPs is a good thing and is explicitly called

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
ave a large effect on the goal, we should > have it discussed when discussing the goals. In addition, while it is often > easy to throw out completely infeasible goals, it is often much harder to > figure out that the goals are unfeasible without fine tuning. > > > > > &

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
signer of software, I always want to > give feedback on APIs, so I'd really like a culture of having those early. > People don't argue about prettiness when they discuss APIs, they argue about > the core concepts to expose in order to meet various goals, and then they&#

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
asible >> right now? If it's infeasible, that will be discovered later during design >> and implementation. Same thing with rejected strategies -- listing some of >> those is definitely useful sometimes, but if you make this a *required* >> section, people are just going

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
s infeasible, that will be discovered later > during design and implementation. Same thing with rejected strategies -- > listing some of those is definitely useful sometimes, but if you make this > a *required* section, people are just going to fill it in with bogus stuff > (I've see

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
Regarding name, if the SIP overlap is a concern, we can pick a different name. My tongue in cheek suggestion would be Spark Lightweight Improvement process (SPARKLI) On Sun, Oct 9, 2016 at 4:14 PM, Cody Koeninger wrote: > So to focus the discussion on the specific strategy I'm su

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
step for user feedback earlier? Or are you just trying to make > design docs for key features more visible (and their approval more formal)? > > BTW note that in either case, I'd like to have a template for design docs > too, which should also include goals. I think that would

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
want the SIPs to be >>> PRDs for getting some quick feedback on the goals of a feature before it is >>> designed, or something more like full-fledged design docs (just a more >>> visible design doc for bigger changes). I looked at Kafka's KIPs, and they >>> actu

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
entails, and then we can discuss this the specific proposal as well. > > > On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger wrote: > >> Yeah, in case it wasn't clear, I was talking about SIPs for major >> user-facing or cross-cutting changes, not minor feat

Re: PSA: JIRA resolutions and meanings

2016-10-09 Thread Cody Koeninger
That's awesome Sean, very clear. One minor thing, noncommiters can't change assigned field as far as I know. On Oct 9, 2016 3:40 AM, "Sean Owen" wrote: I added a variant on this text to https://cwiki.apache.org/ confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark- ContributingtoJ

Re: Improving governance / committers (split from Spark Improvement Proposals thread)

2016-10-08 Thread Cody Koeninger
It's not about technical design disagreement as to matters of taste, it's about familiarity with the domain. To make an analogy, it's as if a committer in MLlib was firmly intent on, I dunno, treating a collection of categorical variables as if it were an ordered range of continuous variables. It

Re: PSA: JIRA resolutions and meanings

2016-10-08 Thread Cody Koeninger
at 2:19 PM, Reynold Xin wrote: > I think so (at least I think it is socially acceptable). Of course, use good > judgement here :) > > > > On Sat, Oct 8, 2016 at 12:06 PM, Cody Koeninger wrote: >> >> So to be clear, can I go clean up the Kafka cruft? >> >>

Re: PSA: JIRA resolutions and meanings

2016-10-08 Thread Cody Koeninger
So to be clear, can I go clean up the Kafka cruft? On Sat, Oct 8, 2016 at 1:41 PM, Reynold Xin wrote: > > On Sat, Oct 8, 2016 at 2:09 AM, Sean Owen wrote: >> >> >> - Resolve as Fixed if there's a change you can point to that resolved the >> issue >> - If the issue is a proper subset of another i

Re: Improving volunteer management / JIRAs (split from Spark Improvement Proposals thread)

2016-10-08 Thread Cody Koeninger
tributions to the attention of committers. > > I dunno if people think this is perhaps too complex, but at our scale I > feel we need some kind of loose but automated system for funneling > contributions through some kind of lifecycle. The status quo is just not > that good (e.g

Re: PSA: JIRA resolutions and meanings

2016-10-08 Thread Cody Koeninger
That makes sense, thanks. One thing I've never been clear on is who should be allowed to resolve Jiras. Can I go clean up the backlog of Kafka Jiras that weren't created by me? If there's an informal policy here, can we update the wiki to reflect it? Maybe it's there already, but I didn't see it

Re: Improving volunteer management / JIRAs (split from Spark Improvement Proposals thread)

2016-10-07 Thread Cody Koeninger
neling contributions > through some kind of lifecycle. The status quo is just not that good (e.g. > 474 open PRs against Spark as of this moment). > > Nick > > > On Fri, Oct 7, 2016 at 4:48 PM Cody Koeninger wrote: >> >> Matei asked: >> >> >> >

Re: Spark Improvement Proposals

2016-10-07 Thread Cody Koeninger
s, missing features, slow reviews > which is understandable to some extent... it is not only about Spark but > things can be improved for sure for this project in particular as already > stated. > > On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger > wrote: > >> +1 to addin

Re: Kafaka 0.8, 0.9 in Structured Streaming

2016-10-07 Thread Cody Koeninger
The main thing is picking up new partitions. You can't do that without reimplementing portions of the consumer rebalance. The low-level consumer is really low level, and the old high-level consumer is basically broken (it might have been fixed by the time they abandoned it, I dunno) On Fri, Oct

Improving governance / committers (split from Spark Improvement Proposals thread)

2016-10-07 Thread Cody Koeninger
So concrete problems / potential solutions: - Technical discussion needs to be public, or you don't hear use cases and alternative viewpoints. Yet email communication is low-bandwidth and hard to read people's emotions, so committers who are colocated talk and decide things. A possible alternativ

Improving volunteer management / JIRAs (split from Spark Improvement Proposals thread)

2016-10-07 Thread Cody Koeninger
Matei asked: > I agree about empowering people interested here to contribute, but I'm > wondering, do you think there are technical things that people don't want to > work on, or is it a matter of what there's been time to do? It's a matter of mismanagement and miscommunication. The structur

Re: Kafaka 0.8, 0.9 in Structured Streaming

2016-10-07 Thread Cody Koeninger
Without a hell of a lot more work, Assign would be the only strategy usable. On Fri, Oct 7, 2016 at 3:25 PM, Michael Armbrust wrote: >> The implementation is totally and completely different however, in ways >> that leak to the end user. > > > Can you elaborate? Especially in the context of the i

Re: Kafaka 0.8, 0.9 in Structured Streaming

2016-10-07 Thread Cody Koeninger
0.10 consumers won't work on an earlier broker. Earlier consumers will (should?) work on a 0.10 broker. The main things earlier consumers lack from a user perspective is support for SSL, and pre-fetching messages. The implementation is totally and completely different however, in ways that leak

Re: Spark Improvement Proposals

2016-10-07 Thread Cody Koeninger
+1 to adding an SIP label and linking it from the website. I think it needs - template that focuses it towards soliciting user goals / non goals - clear resolution as to which strategy was chosen to pursue. I'd recommend a vote. Matei asked me to clarify what I meant by changing interfaces, I t

Re: Spark Improvement Proposals

2016-10-07 Thread Cody Koeninger
Sean, that was very eloquently put, and I 100% agree. If I ever meet you in person, I'll buy you multiple rounds of beverages of your choice ;) This is probably reiterating some of what you said in a less clear manner, but I'll throw more of my 2 cents in. - Design. Yes, design by committee doesn

Spark Improvement Proposals

2016-10-06 Thread Cody Koeninger
I love Spark. 3 or 4 years ago it was the first distributed computing environment that felt usable, and the community was welcoming. But I just got back from the Reactive Summit, and this is what I observed: - Industry leaders on stage making fun of Spark's streaming model - Open source project

Re: Spark SQL JSON Column Support

2016-09-29 Thread Cody Koeninger
Totally agree that specifying the schema manually should be the baseline. LGTM, thanks for working on it. Seems like it looks good to others too judging by the comment on the PR that it's getting merged to master :) On Thu, Sep 29, 2016 at 2:13 PM, Michael Armbrust wrote: >> Will this be able t

Re: Spark SQL JSON Column Support

2016-09-29 Thread Cody Koeninger
Will this be able to handle projection pushdown if a given job doesn't utilize all the columns in the schema? Or should people have a per-job schema? On Wed, Sep 28, 2016 at 2:17 PM, Michael Armbrust wrote: > Burak, you can configure what happens with corrupt records for the > datasource using t

Re: [discuss] Spark 2.x release cadence

2016-09-29 Thread Cody Koeninger
Regarding documentation debt, is there a reason not to deploy documentation updates more frequently than releases? I recall this used to be the case. On Wed, Sep 28, 2016 at 3:35 PM, Joseph Bradley wrote: > +1 for 4 months. With QA taking about a month, that's very reasonable. > > My main ask (

  1   2   3   >