from:"Cody Koeninger"

[jira] [Created] (SPARK-30570) Update scalafmt to 1.0.3 with onlyChangedFiles feature

2020-01-19 Thread Cody Koeninger (Jira)

Cody Koeninger created SPARK-30570: -- Summary: Update scalafmt to 1.0.3 with onlyChangedFiles feature Key: SPARK-30570 URL: https://issues.apache.org/jira/browse/SPARK-30570 Project: Spark

Re: Packages to release in 3.0.0-preview

2019-10-31 Thread Cody Koeninger

On Thu, Oct 31, 2019 at 4:30 PM Sean Owen wrote: > > . But it'd be cooler to call these major > releases! Maybe this is just semantics, but my point is the Scala project already does call 2.12 to 2.13 a major release e.g. from https://www.scala-lang.org/download/ "Note that different *major* r

Re: Packages to release in 3.0.0-preview

2019-10-31 Thread Cody Koeninger

On Wed, Oct 30, 2019 at 5:57 PM Sean Owen wrote: > Or, frankly, maybe Scala should reconsider the mutual incompatibility > between minor releases. These are basically major releases, and > indeed, it causes exactly this kind of headache. > Not saying binary incompatibility is fun, but 2.12 to 2

Re: Structured streaming from Kafka by timestamp

2019-02-05 Thread Cody Koeninger

To be more explicit, the easiest thing to do in the short term is use your own instance of KafkaConsumer to get the offsets for the timestamps you're interested in, using offsetsForTimes, and use those for the start / end offsets. See https://kafka.apache.org/10/javadoc/?org/apache/kafka/clients/c

Re: Back pressure not working on streaming

2019-02-05 Thread Cody Koeninger

That article is pretty old, If you click through the link to the jira mentioned in it, https://issues.apache.org/jira/browse/SPARK-18580 , it's been resolved. On Wed, Jan 2, 2019 at 12:42 AM JF Chen wrote: > > yes, 10 is a very low value for testing initial rate. > And from this article > https:

Re: Structured streaming from Kafka by timestamp

2019-02-05 Thread Cody Koeninger

To be more explicit, the easiest thing to do in the short term is use your own instance of KafkaConsumer to get the offsets for the timestamps you're interested in, using offsetsForTimes, and use those for the start / end offsets. See https://kafka.apache.org/10/javadoc/?org/apache/kafka/clients/c

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread Cody Koeninger

I feel like I've already said my piece on https://github.com/apache/spark/pull/22138 let me know if you have more questions. As for SS in general, I don't have a production SS deployment, so I'm less comfortable with reviewing large changes to it. But if no other committers are working on it...

Re: Automated formatting

2018-11-26 Thread Cody Koeninger

, Nov 22, 2018 at 7:32 PM Matei Zaharia wrote: > > Can we start by just recommending to contributors that they do this manually? > Then if it seems to work fine, we can try to automate it. > > > On Nov 22, 2018, at 4:40 PM, Cody Koeninger wrote: > > > > I believe s

[jira] [Created] (SPARK-26177) Automated formatting for Scala code

2018-11-26 Thread Cody Koeninger (JIRA)

Cody Koeninger created SPARK-26177: -- Summary: Automated formatting for Scala code Key: SPARK-26177 URL: https://issues.apache.org/jira/browse/SPARK-26177 Project: Spark Issue Type

[jira] [Resolved] (SPARK-26121) [Structured Streaming] Allow users to define prefix of Kafka's consumer group (group.id)

2018-11-26 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-26121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-26121. Resolution: Fixed Assignee: Anastasios Zouzias Fix Version/s: 3.0.0

Re: Automated formatting

2018-11-22 Thread Cody Koeninger

On Thu, Nov 22, 2018 at 9:11 AM Cody Koeninger wrote: >> >> Plugin invocation is ./build/mvn mvn-scalafmt_2.12:format >> >> It takes about 5 seconds, and errors out on the first different file >> that doesn't match formatting. >> >> I made a shel

Re: Automated formatting

2018-11-22 Thread Cody Koeninger

ff, seems worth a shot. What's the invocation that Shane > could add (after this change goes in) > On Wed, Nov 21, 2018 at 3:27 PM Cody Koeninger wrote: > > > > There's a mvn plugin (sbt as well, but it requires sbt 1.0+) so it > > should be runnable from the PR builder

Re: Automated formatting

2018-11-21 Thread Cody Koeninger

ad strokes but not in the details. > Is this something that can be just run in the PR builder? if the rules > are simple and not too hard to maintain, seems like a win. > On Wed, Nov 21, 2018 at 2:26 PM Cody Koeninger wrote: > > > > Definitely not suggesting a mass reformat

Re: Automated formatting

2018-11-21 Thread Cody Koeninger

so it's inevitable. > > Is there a way to just check style on PR changes? that's fine. > On Wed, Nov 21, 2018 at 11:40 AM Cody Koeninger wrote: > > > > Is there any appetite for revisiting automating formatting? > > > > I know over the years various p

Automated formatting

2018-11-21 Thread Cody Koeninger

Is there any appetite for revisiting automating formatting? I know over the years various people have expressed opposition to it as unnecessary churn in diffs, but having every new contributor greeted with "nit: 4 space indentation for argument lists" isn't very welcoming. ---

Re: [Structured Streaming] Kafka group.id is fixed

2018-11-19 Thread Cody Koeninger

Anastasios it looks like you already identified the two lines that need to change, the string interpolation that depends on UUID.randomUUID and metadataPath.hashCode. I'd factor that out into a function that returns the group id. That function would also need to take the "parameters" variable (th

Re: DataSourceV2 sync tomorrow

2018-11-13 Thread Cody Koeninger

Am I the only one for whom the livestream link didn't work last time? Would like to be able to at least watch the discussion this time around. On Tue, Nov 13, 2018 at 6:01 PM Ryan Blue wrote: > > Hi everyone, > I just wanted to send out a reminder that there’s a DSv2 sync tomorrow at > 17:00 PST,

[jira] [Commented] (SPARK-25983) spark-sql-kafka-0-10 no longer works with Kafka 0.10.0

2018-11-10 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-25983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682464#comment-16682464 ] Cody Koeninger commented on SPARK-25983: Looks like we need an equiva

[jira] [Commented] (SPARK-25983) spark-sql-kafka-0-10 no longer works with Kafka 0.10.0

2018-11-10 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-25983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682461#comment-16682461 ] Cody Koeninger commented on SPARK-25983: Documentation already explicitly

Re: [Structured Streaming] Kafka group.id is fixed

2018-11-09 Thread Cody Koeninger

That sounds reasonable to me On Fri, Nov 9, 2018 at 2:26 AM Anastasios Zouzias wrote: > > Hi all, > > I run in the following situation with Spark Structure Streaming (SS) using > Kafka. > > In a project that I work on, there is already a secured Kafka setup where ops > can issue an SSL certifica

Re: Nightly Builds in the docs (in spark-nightly/spark-master-bin/latest? Can't seem to find it)

2018-08-31 Thread Cody Koeninger

Just got a question about this on the user list as well. Worth removing that link to pwendell's directory from the docs? On Sun, Jan 21, 2018 at 12:13 PM, Jacek Laskowski wrote: > Hi, > > http://spark.apache.org/developer-tools.html#nightly-builds reads: > >> Spark nightly packages are available

Re: [discuss] replacing SPIP template with Heilmeier's Catechism?

2018-08-31 Thread Cody Koeninger

+1 to Sean's comment On Fri, Aug 31, 2018 at 2:48 PM, Reynold Xin wrote: > Yup all good points. One way I've done it in the past is to have an appendix > section for design sketch, as an expansion to the question "- What is new in > your approach and why do you think it will be successful?" > > O

Re: ConcurrentModificationExceptions with CachedKafkaConsumers

2018-08-31 Thread Cody Koeninger

the group id, as well > as simply moving to use only a single one of our clusters. Neither of these > were successful. I am not able to run a test against master now. > > Regards, > > Bryan Jeffrey > > > > > On Thu, Aug 30, 2018 at 2:56 PM Cody Koeninger wro

Re: Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

2018-08-30 Thread Cody Koeninger

0 PM, Guillermo Ortiz Fernández wrote: > I can't... do you think that it's a possible bug of this version?? from > Spark or Kafka? > > El mié., 29 ago. 2018 a las 22:28, Cody Koeninger () > escribió: >> >> Are you able to try a recent version of spark? >> >

Re: ConcurrentModificationExceptions with CachedKafkaConsumers

2018-08-30 Thread Cody Koeninger

I doubt that fix will get backported to 2.3.x Are you able to test against master? 2.4 with the fix you linked to is likely to hit code freeze soon. >From a quick look at your code, I'm not sure why you're mapping over an array of brokers. It seems like that would result in different streams wi

[jira] [Resolved] (SPARK-25233) Give the user the option of specifying a fixed minimum message per partition per batch when using kafka direct API with backpressure

2018-08-30 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-25233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-25233. Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 3 [https

[jira] [Assigned] (SPARK-25233) Give the user the option of specifying a fixed minimum message per partition per batch when using kafka direct API with backpressure

2018-08-30 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-25233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger reassigned SPARK-25233: -- Assignee: Reza Safi > Give the user the option of specifying a fixed minimum mess

Re: Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

2018-08-29 Thread Cody Koeninger

Are you able to try a recent version of spark? On Wed, Aug 29, 2018 at 2:10 AM, Guillermo Ortiz Fernández wrote: > I'm using Spark Streaming 2.0.1 with Kafka 0.10, sometimes I get this > exception and Spark dies. > > I couldn't see any error or problem among the machines, anybody has the > reason

[jira] [Commented] (SPARK-24987) Kafka Cached Consumer Leaking File Descriptors

2018-08-05 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-24987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569491#comment-16569491 ] Cody Koeninger commented on SPARK-24987: It was merged to branch-2.3 h

Re: Viewing transactional markers in client

2018-08-04 Thread Cody Koeninger

Here's one reason I might want to be able to tell whether a given offset is a transactional marker: https://issues.apache.org/jira/browse/SPARK-24720 Alternatively, is there any efficient way to tell what the offset of the last actual record in a topicpartition is (i.e. like endOffsets) On Thu,

[jira] [Commented] (SPARK-25026) Binary releases don't contain Kafka integration modules

2018-08-04 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-25026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569283#comment-16569283 ] Cody Koeninger commented on SPARK-25026: I don't think distributions

[jira] [Resolved] (SPARK-24987) Kafka Cached Consumer Leaking File Descriptors

2018-08-04 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-24987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-24987. Resolution: Fixed Fix Version/s: 2.4.0 > Kafka Cached Consumer Leaking F

[jira] [Assigned] (SPARK-24987) Kafka Cached Consumer Leaking File Descriptors

2018-08-04 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-24987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger reassigned SPARK-24987: -- Assignee: Yuval Itzchakov > Kafka Cached Consumer Leaking File Descript

Re: Migrating from kafka08 client to kafka010

2018-08-02 Thread Cody Koeninger

Short answer is it isn't necessary. Long answer is that you aren't just changing from 08 to 10, you're changing from the receiver based implementation to the direct stream. Read these: https://github.com/koeninger/kafka-exactly-once http://spark.apache.org/docs/latest/streaming-kafka-0-8-integrat

Re: [DISCUSS] SPIP: Standardize SQL logical plans

2018-07-17 Thread Cody Koeninger

According to http://spark.apache.org/improvement-proposals.html the shepherd should be a PMC member, not necessarily the person who proposed the SPIP On Tue, Jul 17, 2018 at 9:13 AM, Wenchen Fan wrote: > I don't know an official answer, but conventionally people who propose the > SPIP would cal

[jira] [Resolved] (SPARK-24713) AppMatser of spark streaming kafka OOM if there are hundreds of topics consumed

2018-07-13 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-24713. Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21690 [https

[jira] [Assigned] (SPARK-24713) AppMatser of spark streaming kafka OOM if there are hundreds of topics consumed

2018-07-13 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger reassigned SPARK-24713: -- Assignee: Yuanbo Liu > AppMatser of spark streaming kafka OOM if there are hundr

[jira] [Commented] (SPARK-19680) Offsets out of range with no configured reset policy for partitions

2018-07-11 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-19680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540138#comment-16540138 ] Cody Koeninger commented on SPARK-19680: A new consumer group is the eas

[jira] [Commented] (SPARK-24720) kafka transaction creates Non-consecutive Offsets (due to transaction offset) making streaming fail when failOnDataLoss=true

2018-07-06 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-24720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534797#comment-16534797 ] Cody Koeninger commented on SPARK-24720: What's your plan to

[jira] [Resolved] (SPARK-24743) Update the JavaDirectKafkaWordCount example to support the new API of Kafka

2018-07-05 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-24743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-24743. Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21717 [https

[jira] [Commented] (SPARK-18258) Sinks need access to offset representation

2018-06-27 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-18258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525137#comment-16525137 ] Cody Koeninger commented on SPARK-18258: [~Yohan123] This ticket is a

[jira] [Commented] (SPARK-24507) Description in "Level of Parallelism in Data Receiving" section of Spark Streaming Programming Guide in is not relevan for the recent Kafka direct apprach

2018-06-20 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-24507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518287#comment-16518287 ] Cody Koeninger commented on SPARK-24507: You're welcome to submit a do

Re: Kafka Offset Storage: Fetching Offsets

2018-06-14 Thread Cody Koeninger

, Jun 14, 2018 at 3:33 PM, Bryan Jeffrey wrote: > Cody, > > Where is that called in the driver? The only call I see from Subscribe is to > load the offset from checkpoint. > > Get Outlook for Android > > > From: Cody Koeninger > Sent: Thur

Re: Kafka Offset Storage: Fetching Offsets

2018-06-14 Thread Cody Koeninger

s from checkpoint. > > Thank you! > > Bryan > > Get Outlook for Android > > ____ > From: Cody Koeninger > Sent: Thursday, June 14, 2018 4:00:31 PM > To: Bryan Jeffrey > Cc: user > Subject: Re: Kafka Offset Storage: Fetching O

Re: Kafka Offset Storage: Fetching Offsets

2018-06-14 Thread Cody Koeninger

The expectation is that you shouldn't have to manually load offsets from kafka, because the underlying kafka consumer on the driver will start at the offsets associated with the given group id. That's the behavior I see with this example: https://github.com/koeninger/kafka-exactly-once/blob/maste

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0

2018-05-31 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496633#comment-16496633 ] Cody Koeninger commented on SPARK-18057: I'd just modify KafkaTes

[jira] [Commented] (SPARK-24067) Backport SPARK-17147 to 2.3 (Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction))

2018-05-14 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-24067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474549#comment-16474549 ] Cody Koeninger commented on SPARK-24067: [~zsxwing] even in situations w

[jira] [Resolved] (SPARK-24067) Backport SPARK-17147 to 2.3 (Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction))

2018-05-11 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-24067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-24067. Resolution: Fixed Fix Version/s: 2.3.1 Issue resolved by pull request 21300 [https

Re: Time for 2.3.1?

2018-05-11 Thread Cody Koeninger

Sounds good, I'd like to add SPARK-24067 today assuming there's no objections On Thu, May 10, 2018 at 1:22 PM, Henry Robinson wrote: > +1, I'd like to get a release out with SPARK-23852 fixed. The Parquet > community are about to release 1.8.3 - the voting period closes tomorrow - > and I've test

Re: [Structured-Streaming][Beginner] Out of order messages with Spark kafka readstream from a specific partition

2018-05-10 Thread Cody Koeninger

As long as you aren't doing any spark operations that involve a shuffle, the order you see in spark should be the same as the order in the partition. Can you link to a minimal code example that reproduces the issue? On Wed, May 9, 2018 at 7:05 PM, karthikjay wrote: > On the producer side, I make

[jira] [Commented] (SPARK-24067) Backport SPARK-17147 to 2.3 (Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction))

2018-04-25 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-24067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452603#comment-16452603 ] Cody Koeninger commented on SPARK-24067: The original PR [https://github

[jira] [Commented] (SPARK-24067) Backport SPARK-17147 to 2.3 (Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction))

2018-04-25 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-24067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452337#comment-16452337 ] Cody Koeninger commented on SPARK-24067: Given the response on the dev

Process for backports?

2018-04-24 Thread Cody Koeninger

https://issues.apache.org/jira/browse/SPARK-24067 is asking to backport a change to the 2.3 branch. My questions - In general are there any concerns about what qualifies for backporting? This adds a configuration variable but shouldn't change default behavior. - Is a separate jira + pr actuall

[jira] [Resolved] (SPARK-21168) KafkaRDD should always set kafka clientId.

2018-04-23 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-21168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-21168. Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 19887 [https

[jira] [Assigned] (SPARK-21168) KafkaRDD should always set kafka clientId.

2018-04-23 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-21168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger reassigned SPARK-21168: -- Assignee: liuzhaokun > KafkaRDD should always set kafka clien

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0

2018-04-18 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442838#comment-16442838 ] Cody Koeninger commented on SPARK-18057: [~cricket007] here's a br

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0

2018-04-17 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441818#comment-16441818 ] Cody Koeninger commented on SPARK-18057: Ok, if you can figure out what ver

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0

2018-04-17 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441793#comment-16441793 ] Cody Koeninger commented on SPARK-18057: Just adding the extra dependenc

[jira] [Resolved] (SPARK-22968) java.lang.IllegalStateException: No current assignment for partition kssh-2

2018-04-17 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-22968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-22968. Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21038 [https

[jira] [Assigned] (SPARK-22968) java.lang.IllegalStateException: No current assignment for partition kssh-2

2018-04-17 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-22968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger reassigned SPARK-22968: -- Assignee: Saisai Shao > java.lang.IllegalStateException: No current assignment

Re: Structured streaming: Tried to fetch $offset but the returned record offset was ${record.offset}"

2018-04-17 Thread Cody Koeninger

Is this possibly related to the recent post on https://issues.apache.org/jira/browse/SPARK-18057 ? On Mon, Apr 16, 2018 at 11:57 AM, ARAVIND SETHURATHNAM < asethurath...@homeaway.com.invalid> wrote: > Hi, > > We have several structured streaming jobs (spark version 2.2.0) consuming > from kafka a

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0

2018-04-17 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441606#comment-16441606 ] Cody Koeninger commented on SPARK-18057: Out of curiosity, was that a compa

[jira] [Commented] (SPARK-19680) Offsets out of range with no configured reset policy for partitions

2018-04-10 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-19680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433039#comment-16433039 ] Cody Koeninger commented on SPARK-19680: [~nerdynick] If you submit a PR to

Re: Welcome Zhenhua Wang as a Spark committer

2018-04-02 Thread Cody Koeninger

Congrats! On Mon, Apr 2, 2018 at 12:28 AM, Wenchen Fan wrote: > Hi all, > > The Spark PMC recently added Zhenhua Wang as a committer on the project. > Zhenhua is the major contributor of the CBO project, and has been > contributing across several areas of Spark for a while, focusing especially >

[jira] [Commented] (SPARK-23739) Spark structured streaming long running problem

2018-03-26 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-23739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414326#comment-16414326 ] Cody Koeninger commented on SPARK-23739: Ok, the OutOfMemoryError is probab

[jira] [Commented] (SPARK-23739) Spark structured streaming long running problem

2018-03-23 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-23739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16411503#comment-16411503 ] Cody Koeninger commented on SPARK-23739: I meant the version of

[jira] [Commented] (SPARK-23739) Spark structured streaming long running problem

2018-03-23 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-23739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16411396#comment-16411396 ] Cody Koeninger commented on SPARK-23739: What version of the org.apache.k

[jira] [Resolved] (SPARK-18580) Use spark.streaming.backpressure.initialRate in DirectKafkaInputDStream

2018-03-21 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-18580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-18580. Resolution: Fixed Target Version/s: 2.4.0 >

[jira] [Assigned] (SPARK-18580) Use spark.streaming.backpressure.initialRate in DirectKafkaInputDStream

2018-03-21 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-18580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger reassigned SPARK-18580: -- Assignee: Oleksandr Konopko > Use spark.streaming.backpressure.initialRate

Re: is it possible to use Spark 2.3.0 along with Kafka 0.9.0.1?

2018-03-16 Thread Cody Koeninger

Should be able to use the 0.8 kafka dstreams with a kafka 0.9 broker On Fri, Mar 16, 2018 at 7:52 AM, kant kodali wrote: > Hi All, > > is it possible to use Spark 2.3.0 along with Kafka 0.9.0.1? > > Thanks, > kant - To unsubscri

[jira] [Resolved] (SPARK-18371) Spark Streaming backpressure bug - generates a batch with large number of records

2018-03-16 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-18371. Resolution: Fixed Fix Version/s: 2.4.0 > Spark Streaming backpressure

[jira] [Assigned] (SPARK-18371) Spark Streaming backpressure bug - generates a batch with large number of records

2018-03-16 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger reassigned SPARK-18371: -- Assignee: Sebastian Arzt > Spark Streaming backpressure bug - generates a batch w

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0

2018-03-05 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387245#comment-16387245 ] Cody Koeninger commented on SPARK-18057: It's probably easiest to kee

[jira] [Commented] (SPARK-19767) API Doc pages for Streaming with Kafka 0.10 not current

2018-03-05 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387217#comment-16387217 ] Cody Koeninger commented on SPARK-19767: [~nafshartous] I think at this p

Re: Welcoming some new committers

2018-03-02 Thread Cody Koeninger

er past work: > > - Anirudh Ramanathan (contributor to Kubernetes support) > - Bryan Cutler (contributor to PySpark and Arrow support) > - Cody Koeninger (contributor to streaming and Kafka support) > - Erik Erlandson (contributor to Kubernetes support) > - Matt Cheah (contributor to Kube

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2018-02-20 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370924#comment-16370924 ] Cody Koeninger commented on SPARK-18057: Just doing the upgrade is probab

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2018-02-20 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370460#comment-16370460 ] Cody Koeninger commented on SPARK-18057: My guess is that DStream b

Re: KafkaUtils.createStream(..) is removed for API

2018-02-19 Thread Cody Koeninger

I can't speak for committers, but my guess is it's more likely for DStreams in general to stop being supported before that particular integration is removed. On Sun, Feb 18, 2018 at 9:34 PM, naresh Goud wrote: > Thanks Ted. > > I see createDirectStream is experimental as annotated with > "org.ap

Re: KafkaUtils.createStream(..) is removed for API

2018-02-19 Thread Cody Koeninger

I can't speak for committers, but my guess is it's more likely for DStreams in general to stop being supported before that particular integration is removed. On Sun, Feb 18, 2018 at 9:34 PM, naresh Goud wrote: > Thanks Ted. > > I see createDirectStream is experimental as annotated with > "org.ap

Re: org.apache.kafka.clients.consumer.OffsetOutOfRangeException

2018-02-12 Thread Cody Koeninger

https://issues.apache.org/jira/browse/SPARK-19680 and https://issues.apache.org/jira/browse/KAFKA-3370 has a good explanation. Verify that it works correctly with auto offset set to latest, to rule out other issues. Then try providing explicit starting offsets reasonably near the beginning of

Re: Providing Kafka configuration as Map of Strings

2018-01-24 Thread Cody Koeninger

Have you tried passing in a Map that happens to have string for all the values? I haven't tested this, but the underlying kafka consumer constructor is documented to take either strings or objects as values, despite the static type. On Wed, Jan 24, 2018 at 2:48 PM, Tecno Brain wrote: > Basically

Re: uncontinuous offset in kafka will cause the spark streamingfailure

2018-01-24 Thread Cody Koeninger

ems this patch is not suitable for our problem。 > > https://github.com/apache/spark/compare/master...koeninger:SPARK-17147 > > wood.super > > 原始邮件 > 发件人: namesuperwood > 收件人: Justin Miller > 抄送: user; Cody Koeninger > 发送时间: 2018年1月24日(周三) 14:45 > 主题: Re: unconti

Re: Contiguous Offsets on non-compacted topics

2018-01-24 Thread Cody Koeninger

Can anyone clarify what (other than the known cases of compaction or transactions) could be causing non-contiguous offsets? That sounds like a potential defect, given that I ran billions of messages a day through kafka 0.8.x series for years without seeing that. On Tue, Jan 23, 2018 at 3:35 PM, J

Re: "Got wrong record after seeking to offset" issue

2018-01-18 Thread Cody Koeninger

y to give that a shot too. > > I’ll try consuming from the failed offset if/when the problem manifests > itself again. > > Thanks! > Justin > > > On Wednesday, January 17, 2018, Cody Koeninger wrote: >> >> That means the consumer on the executor tried to seek

Re: "Got wrong record after seeking to offset" issue

2018-01-17 Thread Cody Koeninger

That means the consumer on the executor tried to seek to the specified offset, but the message that was returned did not have a matching offset. If the executor can't get the messages the driver told it to get, something's generally wrong. What happens when you try to consume the particular faili

Re: Which kafka client to use with spark streaming

2017-12-26 Thread Cody Koeninger

Do not add a dependency on kafka-clients, the spark-streaming-kafka library has appropriate transitive dependencies. Either version of the spark-streaming-kafka library should work with 1.0 brokers; what problems were you having? On Mon, Dec 25, 2017 at 7:58 PM, Diogo Munaro Vieira wrote: > He

Re: How to properly execute `foreachPartition` in Spark 2.2

2017-12-18 Thread Cody Koeninger

You can't create a network connection to kafka on the driver and then serialize it to send it the executor. That's likely why you're getting serialization errors. Kafka producers are thread safe and designed for use as a singleton. Use a lazy singleton instance of the producer on the executor, d

Re: bulk upsert data batch from Kafka dstream into Postgres db

2017-12-14 Thread Cody Koeninger

Modern versions of postgres have upsert, ie insert into ... on conflict ... do update On Thu, Dec 14, 2017 at 11:26 AM, salemi wrote: > Thank you for your respond. > The approach loads just the data into the DB. I am looking for an approach > that allows me to update existing entries in the DB a

Re: bulk upsert data batch from Kafka dstream into Postgres db

2017-12-14 Thread Cody Koeninger

use foreachPartition(), get a connection from a jdbc connection pool, and insert the data the same way you would in a non-spark program. If you're only doing inserts, postgres COPY will be faster (e.g. https://discuss.pivotal.io/hc/en-us/articles/204237003), but if you're doing updates that's not

Re: [Spark streaming] No assigned partition error during seek

2017-12-01 Thread Cody Koeninger

it useful for our needs”, do you > mean to say the code like below? > > kafkaDStream.asInstanceOf[CanCommitOffsets].commitAsync(ranges) > > > > > > Best Regards > > Richard > > > > > > From: venkat > Date: Thursday, November 30, 2017 at 8:16 PM &

Re: Kafka version support

2017-11-30 Thread Cody Koeninger

Are you talking about the broker version, or the kafka-clients artifact version? On Thu, Nov 30, 2017 at 12:17 AM, Raghavendra Pandey wrote: > Just wondering if anyone has tried spark structured streaming kafka > connector (2.2) with Kafka 0.11 or Kafka 1.0 version > > Thanks > Raghav --

Re: [Spark streaming] No assigned partition error during seek

2017-11-30 Thread Cody Koeninger

You mentioned 0.11 version; the latest version of org.apache.kafka kafka-clients artifact supported by DStreams is 0.10.0.1, for which it has an appropriate dependency. Are you manually depending on a different version of the kafka-clients artifact? On Fri, Nov 24, 2017 at 7:39 PM, venks61176 wr

[jira] [Resolved] (SPARK-22561) Dynamically update topics list for spark kafka consumer

2017-11-23 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-22561. Resolution: Not A Problem > Dynamically update topics list for spark kafka consu

[jira] [Commented] (SPARK-22561) Dynamically update topics list for spark kafka consumer

2017-11-23 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264643#comment-16264643 ] Cody Koeninger commented on SPARK-22561: See SubscribePattern

Re: Spark Streaming fails with unable to get records after polling for 512 ms

2017-11-17 Thread Cody Koeninger

t Stream ? > > Thanks > Jagadish > > On Wed, Nov 15, 2017 at 11:23 AM, Cody Koeninger wrote: >> >> spark.streaming.kafka.consumer.poll.ms is a spark configuration, not >> a kafka parameter. >> >> see http://spark.apache.org/docs/latest/configuration.ht

Re: Spark Streaming fails with unable to get records after polling for 512 ms

2017-11-15 Thread Cody Koeninger

spark.streaming.kafka.consumer.poll.ms is a spark configuration, not a kafka parameter. see http://spark.apache.org/docs/latest/configuration.html On Tue, Nov 14, 2017 at 8:56 PM, jkagitala wrote: > Hi, > > I'm trying to add spark-streaming to our kafka topic. But, I keep getting > this error >

[jira] [Commented] (SPARK-22486) Support synchronous offset commits for Kafka

2017-11-12 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-22486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16248891#comment-16248891 ] Cody Koeninger commented on SPARK-22486: Can you identify a clear use case

[jira] [Commented] (SPARK-19680) Offsets out of range with no configured reset policy for partitions

2017-11-07 Thread Cody Koeninger (JIRA)

[ https://issues.apache.org/jira/browse/SPARK-19680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16242438#comment-16242438 ] Cody Koeninger commented on SPARK-19680: If you got an OffsetOutOfRangeExcep

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-01 Thread Cody Koeninger

Was there any answer to my question around the effect of changes to the sink api regarding access to underlying offsets? On Wed, Nov 1, 2017 at 11:32 AM, Reynold Xin wrote: > Most of those should be answered by the attached design sketch in the JIRA > ticket. > > On Wed, Nov 1, 2017 at 5:29 PM De

Re: FW: Kafka Direct Stream - dynamic topic subscription

2017-10-29 Thread Cody Koeninger

As it says in SPARK-10320 and in the docs at http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#consumerstrategies , you can use SubscribePattern On Sun, Oct 29, 2017 at 3:56 PM, Ramanan, Buvana (Nokia - US/Murray Hill) wrote: > Hello Cody, > > > > As the stake holders of J

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1425 matches

Mail list logo