To be more explicit, the easiest thing to do in the short term is use
your own instance of KafkaConsumer to get the offsets for the
timestamps you're interested in, using offsetsForTimes, and use those
for the start / end offsets. See
That article is pretty old, If you click through the link to the jira
mentioned in it, https://issues.apache.org/jira/browse/SPARK-18580 ,
it's been resolved.
On Wed, Jan 2, 2019 at 12:42 AM JF Chen wrote:
>
> yes, 10 is a very low value for testing initial rate.
> And from this article
>
id, as well
> as simply moving to use only a single one of our clusters. Neither of these
> were successful. I am not able to run a test against master now.
>
> Regards,
>
> Bryan Jeffrey
>
>
>
>
> On Thu, Aug 30, 2018 at 2:56 PM Cody Koeninger wrote:
>&g
Ortiz Fernández
wrote:
> I can't... do you think that it's a possible bug of this version?? from
> Spark or Kafka?
>
> El mié., 29 ago. 2018 a las 22:28, Cody Koeninger ()
> escribió:
>>
>> Are you able to try a recent version of spark?
>>
>> On Wed, Aug 29, 201
I doubt that fix will get backported to 2.3.x
Are you able to test against master? 2.4 with the fix you linked to
is likely to hit code freeze soon.
>From a quick look at your code, I'm not sure why you're mapping over
an array of brokers. It seems like that would result in different
streams
Are you able to try a recent version of spark?
On Wed, Aug 29, 2018 at 2:10 AM, Guillermo Ortiz Fernández
wrote:
> I'm using Spark Streaming 2.0.1 with Kafka 0.10, sometimes I get this
> exception and Spark dies.
>
> I couldn't see any error or problem among the machines, anybody has the
>
at 3:33 PM, Bryan Jeffrey wrote:
> Cody,
>
> Where is that called in the driver? The only call I see from Subscribe is to
> load the offset from checkpoint.
>
> Get Outlook for Android
>
>
> From: Cody Koeninger
> Sent: Thursday, June 14,
m checkpoint.
>
> Thank you!
>
> Bryan
>
> Get Outlook for Android
>
> ____
> From: Cody Koeninger
> Sent: Thursday, June 14, 2018 4:00:31 PM
> To: Bryan Jeffrey
> Cc: user
> Subject: Re: Kafka Offset Storage: Fetching Off
The expectation is that you shouldn't have to manually load offsets
from kafka, because the underlying kafka consumer on the driver will
start at the offsets associated with the given group id.
That's the behavior I see with this example:
As long as you aren't doing any spark operations that involve a
shuffle, the order you see in spark should be the same as the order in
the partition.
Can you link to a minimal code example that reproduces the issue?
On Wed, May 9, 2018 at 7:05 PM, karthikjay wrote:
> On the
Is this possibly related to the recent post on
https://issues.apache.org/jira/browse/SPARK-18057 ?
On Mon, Apr 16, 2018 at 11:57 AM, ARAVIND SETHURATHNAM <
asethurath...@homeaway.com.invalid> wrote:
> Hi,
>
> We have several structured streaming jobs (spark version 2.2.0) consuming
> from kafka
Should be able to use the 0.8 kafka dstreams with a kafka 0.9 broker
On Fri, Mar 16, 2018 at 7:52 AM, kant kodali wrote:
> Hi All,
>
> is it possible to use Spark 2.3.0 along with Kafka 0.9.0.1?
>
> Thanks,
> kant
I can't speak for committers, but my guess is it's more likely for
DStreams in general to stop being supported before that particular
integration is removed.
On Sun, Feb 18, 2018 at 9:34 PM, naresh Goud wrote:
> Thanks Ted.
>
> I see createDirectStream is
Have you tried passing in a Map that happens to have
string for all the values? I haven't tested this, but the underlying
kafka consumer constructor is documented to take either strings or
objects as values, despite the static type.
On Wed, Jan 24, 2018 at 2:48 PM, Tecno Brain
wise.com>
> 抄送: user<user@spark.apache.org>; Cody Koeninger<c...@koeninger.org>
> 发送时间: 2018年1月24日(周三) 14:45
> 主题: Re: uncontinuous offset in kafka will cause the spark streamingfailure
>
> Yes. My spark streaming application works with uncompacted topic. I will
>
nd I can run to
> check compaction I’m happy to give that a shot too.
>
> I’ll try consuming from the failed offset if/when the problem manifests
> itself again.
>
> Thanks!
> Justin
>
>
> On Wednesday, January 17, 2018, Cody Koeninger <c...@koeninger.org> wrote:
>
That means the consumer on the executor tried to seek to the specified
offset, but the message that was returned did not have a matching
offset. If the executor can't get the messages the driver told it to
get, something's generally wrong.
What happens when you try to consume the particular
Do not add a dependency on kafka-clients, the spark-streaming-kafka
library has appropriate transitive dependencies.
Either version of the spark-streaming-kafka library should work with
1.0 brokers; what problems were you having?
On Mon, Dec 25, 2017 at 7:58 PM, Diogo Munaro Vieira
You can't create a network connection to kafka on the driver and then
serialize it to send it the executor. That's likely why you're getting
serialization errors.
Kafka producers are thread safe and designed for use as a singleton.
Use a lazy singleton instance of the producer on the executor,
Modern versions of postgres have upsert, ie insert into ... on
conflict ... do update
On Thu, Dec 14, 2017 at 11:26 AM, salemi wrote:
> Thank you for your respond.
> The approach loads just the data into the DB. I am looking for an approach
> that allows me to update
use foreachPartition(), get a connection from a jdbc connection pool,
and insert the data the same way you would in a non-spark program.
If you're only doing inserts, postgres COPY will be faster (e.g.
https://discuss.pivotal.io/hc/en-us/articles/204237003), but if you're
doing updates that's not
d the spark async commit useful for our needs”, do you
> mean to say the code like below?
>
> kafkaDStream.asInstanceOf[CanCommitOffsets].commitAsync(ranges)
>
>
>
>
>
> Best Regards
>
> Richard
>
>
>
>
>
> From: venkat <meven...@gma
Are you talking about the broker version, or the kafka-clients artifact version?
On Thu, Nov 30, 2017 at 12:17 AM, Raghavendra Pandey
wrote:
> Just wondering if anyone has tried spark structured streaming kafka
> connector (2.2) with Kafka 0.11 or Kafka 1.0 version
You mentioned 0.11 version; the latest version of org.apache.kafka
kafka-clients artifact supported by DStreams is 0.10.0.1, for which it
has an appropriate dependency.
Are you manually depending on a different version of the kafka-clients artifact?
On Fri, Nov 24, 2017 at 7:39 PM, venks61176
eam ?
>
> Thanks
> Jagadish
>
> On Wed, Nov 15, 2017 at 11:23 AM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> spark.streaming.kafka.consumer.poll.ms is a spark configuration, not
>> a kafka parameter.
>>
>> see http://spark.apache.org/docs/l
spark.streaming.kafka.consumer.poll.ms is a spark configuration, not
a kafka parameter.
see http://spark.apache.org/docs/latest/configuration.html
On Tue, Nov 14, 2017 at 8:56 PM, jkagitala wrote:
> Hi,
>
> I'm trying to add spark-streaming to our kafka topic. But, I keep
As it says in SPARK-10320 and in the docs at
http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#consumerstrategies
, you can use SubscribePattern
On Sun, Oct 29, 2017 at 3:56 PM, Ramanan, Buvana (Nokia - US/Murray
Hill) wrote:
> Hello
You should be able to pass a comma separated string of topics to
subscribe. subscribePattern isn't necessary
On Tue, Sep 19, 2017 at 2:54 PM, kant kodali wrote:
> got it! Sorry.
>
> On Tue, Sep 19, 2017 at 12:52 PM, Jacek Laskowski wrote:
>>
>> Hi,
>>
>>
Have you searched in jira, e.g.
https://issues.apache.org/jira/browse/SPARK-19185
On Mon, Sep 18, 2017 at 1:56 AM, HARSH TAKKAR wrote:
> Hi
>
> Changing spark version if my last resort, is there any other workaround for
> this problem.
>
>
> On Mon, Sep 18, 2017 at 11:43
If you want an "easy" but not particularly performant way to do it,
each org.apache.kafka.clients.consumer.ConsumerRecord
has a topic.
The topic is going to be the same for the entire partition as long as you
haven't shuffled, hence the examples on how to deal with it at a partition
level.
On
ms)
> )
>
> val kafkaStreamRdd = kafkaStream.transform { rdd =>
> rdd.map(consumerRecord => (consumerRecord.key(), consumerRecord.value()))
> }
>
> On Mon, Aug 28, 2017 at 11:56 AM, swetha kasireddy <
> swethakasire...@gmail.com> wrote:
>
>> There is no
xCapacity" -> Integer.valueOf(96),
> "spark.streaming.kafka.consumer.cache.maxCapacity" -> Integer.valueOf(96),
>
> "spark.streaming.kafka.consumer.poll.ms" -> Integer.valueOf(1024),
>
>
> http://markmail.org/message/n4cdxwurlhf44q5x
>
> https://issues.
1. No, prefetched message offsets aren't exposed.
2. No, I'm not aware of any plans for sync commit, and I'm not sure
that makes sense. You have to be able to deal with repeat messages in
the event of failure in any case, so the only difference sync commit
would make would be (possibly) slower
Why are you setting consumer.cache.enabled to false?
On Fri, Aug 25, 2017 at 2:19 PM, SRK wrote:
> Hi,
>
> What would be the appropriate settings to run Spark with Kafka 10? My job
> works fine with Spark with Kafka 8 and with Kafka 8 cluster. But its very
> slow with
o.offset.reset" -> "latest",.
>
>
> Thanks!
>
> On Mon, Aug 21, 2017 at 9:06 AM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> Yes, you can start from specified offsets. See ConsumerStrategy,
>> specifically Assign
>>
>>
>> http:
Yes, you can start from specified offsets. See ConsumerStrategy,
specifically Assign
http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#your-own-data-store
On Tue, Aug 15, 2017 at 1:18 PM, SRK wrote:
> Hi,
>
> How to force Spark Kafka Direct to
org.apache.spark.streaming.kafka.KafkaCluster has methods
getLatestLeaderOffsets and getEarliestLeaderOffsets
On Mon, Aug 7, 2017 at 11:37 PM, shyla deshpande
wrote:
> Thanks TD.
>
> On Mon, Aug 7, 2017 at 8:59 PM, Tathagata Das
> wrote:
>>
saved into Cassandra. It is working as expected 99% of the time except that
> when there is an exception, I did not want the offsets to be committed.
>
> By Filtering for unsuccessful attempts, do you mean filtering the bad
> records...
>
>
>
>
>
>
> On Mon, Aug 7, 2017
raised, I was thinking I won't be committing the
>> offsets, but the offsets are committed all the time independent of whether
>> an exception was raised or not.
>>
>> It will be helpful if you can explain this behavior.
>>
>>
>> On Sun, Aug 6, 2017 at 5:19
sould be doing, then what is the right way?
>
> On Sun, Aug 6, 2017 at 9:21 AM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> If your complaint is about offsets being committed that you didn't
>> expect... auto commit being false on executors shouldn't have anythin
If your complaint is about offsets being committed that you didn't
expect... auto commit being false on executors shouldn't have anything
to do with that. Executors shouldn't be auto-committing, that's why
it's being overridden.
What you've said and the code you posted isn't really enough to
The warnings regarding configuration on the executor are for the
executor kafka consumer, not the driver kafka consumer.
In general, the executor kafka consumers should consume only exactly
the offsets the driver told them to, and not be involved in committing
offsets / part of the same group as
Given the emphasis on structured streaming, I don't personally expect
a lot more work being put into DStreams-based projects, outside of
bugfixes. Stable designation is kind of arbitrary at that point.
That 010 version wasn't developed until spark 2.0 timeframe, but you
can always try
en i provide the
> appropriate packages in LibraryDependencies ?
> which ones would have helped compile this ?
>
>
>
> On Sat, Jun 17, 2017 at 2:53 PM, karan alang <karan.al...@gmail.com> wrote:
>>
>> Thanks, Cody .. yes, was able to fix that.
>>
There are different projects for different versions of kafka,
spark-streaming-kafka-0-8 and spark-streaming-kafka-0-10
See
http://spark.apache.org/docs/latest/streaming-kafka-integration.html
On Fri, Jun 16, 2017 at 6:51 PM, karan alang wrote:
> I'm trying to compile
First thing I noticed, you should be using a singleton kafka producer,
not recreating one every partition. It's threadsafe.
On Tue, May 30, 2017 at 7:59 AM, Vikash Pareek
wrote:
> I am facing an issue related to spark streaming with kafka, my use case is as
>
You don't need write ahead logs for direct stream.
On Tue, May 2, 2017 at 11:32 AM, kant kodali wrote:
> Hi All,
>
> I need some fault tolerance for my stateful computations and I am wondering
> why we need to enable writeAheadLogs for DirectStream like Kafka (for
> Indirect
t the semantics here - i.e., calls to commitAsync are not
> actually guaranteed to succeed? If that's the case, the docs could really
> be a *lot* clearer about that.
>
> Thanks,
>
> DR
>
> On Fri, Apr 28, 2017 at 11:34 AM, Cody Koeninger <c...@koeninger.org> wrote:
>From that doc:
" However, Kafka is not transactional, so your outputs must still be
idempotent. "
On Fri, Apr 28, 2017 at 10:29 AM, David Rosenstrauch wrote:
> I'm doing a POC to test recovery with spark streaming from Kafka. I'm using
> the technique for storing the
il the 10th second of
> executing the last consumed offset of the same partition was 200.000 - and so
> forth. This is the information I seek to get.
>
>> On 27 Apr 2017, at 20:11, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> Are you asking for commits for ev
sets of the intermediate batches - and hence the questions.
>
>> On 26 Apr 2017, at 21:42, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> have you read
>>
>> http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#kafka-itself
>>
r this. Please
>> provide me some links to help me get started.
>>
>> Thanks
>> -Anna
>>
>> On Wed, Apr 26, 2017 at 7:46 PM, Cody Koeninger <c...@koeninger.org>
>> wrote:
>>>
>>> The standalone cluster manager is fine for production. D
The standalone cluster manager is fine for production. Don't use Yarn
or Mesos unless you already have another need for it.
On Wed, Apr 26, 2017 at 4:53 PM, anna stax wrote:
> Hi Sam,
>
> Thank you for the reply.
>
> What do you mean by
> I doubt people run spark in a.
t; enable.auto.commit property to false.
>
> Any idea onto how to use the KafkaConsumer’s auto offset commits? Keep in
> mind that I do not care about exactly-once, hence having messages replayed is
> perfectly fine.
>
>> On 26 Apr 2017, at 19:26, Cody Koeninger <c...@
What is it you're actually trying to accomplish?
You can get topic, partition, and offset bounds from an offset range like
http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#obtaining-offsets
Timestamp isn't really a meaningful idea for a range of offsets.
On Tue, Apr
If you're talking about reading the same message multiple times in a
failure situation, see
https://github.com/koeninger/kafka-exactly-once
If you're talking about producing the same message multiple times in a
failure situation, keep an eye on
Glad you got it worked out. That's cool as long as your use case doesn't
actually require e.g. partition 0 to always be scheduled to the same
executor across different batches.
On Tue, Mar 21, 2017 at 7:35 PM, OUASSAIDI, Sami
wrote:
> So it worked quite well with a
You want spark.streaming.kafka.maxRatePerPartition for the direct stream.
On Sat, Mar 18, 2017 at 3:37 PM, Mal Edwin wrote:
>
> Hi,
> You can enable backpressure to handle this.
>
> spark.streaming.backpressure.enabled
> spark.streaming.receiver.maxRate
>
> Thanks,
>
Probably easier if you show some more code, but if you just call
dstream.window(Seconds(60))
you didn't specify a slide duration, so it's going to default to your
batch duration of 1 second.
So yeah, if you're just using e.g. foreachRDD to output every message
in the window, every second it's
Spark just really isn't a good fit for trying to pin particular computation
to a particular executor, especially if you're relying on that for
correctness.
On Thu, Mar 16, 2017 at 7:16 AM, OUASSAIDI, Sami
wrote:
>
> Hi all,
>
> So I need to specify how an executor
The kafka-0-8 and kafka-0-10 integrations have conflicting
dependencies. Last time I checked, Spark's doc publication puts
everything all in one classpath, so publishing them both together
won't work. I thought there was already a Jira ticket related to
this, but a quick look didn't turn it up.
mja...@seecs.edu.pk> wrote:
> I just noticed that Spark version that I am using (2.0.2) is built with
> Scala 2.11. However I am using Kafka 0.8.2 built with Scala 2.10. Could this
> be the reason why we are getting this error?
>
> On Mon, Feb 20, 2017 at 5:50 PM, Cody Koeninger <c
+ inTime + ", " + outTime)
>
> }
>
> }
>
> }))
>
> }
>
> }
>
>
> On Mon, Feb 20, 2017 at 3:10 PM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> That's an indication that the beginning offset for a given b
That's an indication that the beginning offset for a given batch is
higher than the ending offset, i.e. something is seriously wrong.
Are you doing anything at all odd with topics, i.e. deleting and
recreating them, using compacted topics, etc?
Start off with a very basic stream over the same
Not sure what to tell you at that point - maybe compare what is
present in ZK to a known working group.
On Tue, Feb 14, 2017 at 9:06 PM, Mohammad Kargar <mkar...@phemi.com> wrote:
> Yes offset nodes are in zk and I can get the values.
>
> On Feb 14, 2017 6:54 PM, &quo
spark job.
On Tue, Feb 14, 2017 at 8:40 PM, Mohammad Kargar <mkar...@phemi.com> wrote:
> I'm running 0.10 version and
>
> ./bin/kafka-consumer-groups.sh --zookeeper localhost:2181 --list
>
> lists the group.
>
> On Feb 14, 2017 6:34 PM, "Cody Koeninger" <
It looks like you're creating a kafka producer on the driver, and
attempting to write the string representation of
stringIntegerJavaPairRDD. Instead, you probably want to be calling
stringIntegerJavaPairRDD.foreachPartition, so that producing to kafka
is being done on the executor.
Read
Can you explain what wasn't successful and/or show code?
On Tue, Feb 14, 2017 at 6:03 PM, Mohammad Kargar wrote:
> As explained here, direct approach of integration between spark streaming
> and kafka does not update offsets in Zookeeper, hence Zookeeper-based Kafka
>
Pretty sure there was no 0.10.0.2 release of apache kafka. If that's
a hortonworks modified version you may get better results asking in a
hortonworks specific forum. Scala version of kafka shouldn't be
relevant either way though.
On Wed, Feb 8, 2017 at 5:30 PM, u...@moosheimer.com
You should not need to include jars for Kafka, the spark connectors
have the appropriate transitive dependency on the correct version.
On Sat, Feb 4, 2017 at 3:25 PM, Marco Mistroni wrote:
> Hi
> not sure if this will help at all, and pls take it with a pinch of salt as
> i
spark-streaming-kafka-0-10 has a transitive dependency on the kafka
library, you shouldn't need to include kafka explicitly.
What's your actual list of dependencies?
On Tue, Jan 31, 2017 at 3:49 PM, Marco Mistroni wrote:
> HI all
> i am trying to run a sample spark code
Keep an eye on
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging
although it'll likely be a while
On Mon, Jan 30, 2017 at 3:41 PM, Tathagata Das
wrote:
> If you care about the semantics of those writes to
each consumer's coverage and lag status.
On Tue, Jan 24, 2017 at 10:45 PM, Cody Koeninger <c...@koeninger.org> wrote:
> When you said " I check the offset ranges from Kafka Manager and don't
> see any significant deltas.", what were you comparing it against? The
> offset
arch. An
> another legacy app also writes the same results to a database. There are
> huge difference between DB and ES. I know how many records we process daily.
>
> Everything works fine if I run a job instance for each topic.
>
> On Tue, Jan 24, 2017 at 5:26 PM, Cody Koeninger <c
am as one stream for all topics. I check the offset
> ranges from Kafka Manager and don't see any significant deltas.
>
> On Tue, Jan 24, 2017 at 4:42 AM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> Are you using receiver-based or direct stream?
>>
>> Are y
Can you identify the error case and call System.exit ? It'll get
retried on another executor, but as long as that one fails the same
way...
If you can identify the error case at the time you're doing database
interaction and just prevent data being written then, that's what I
typically do.
On
Are you using receiver-based or direct stream?
Are you doing 1 stream per topic, or 1 stream for all topics?
If you're using the direct stream, the actual topics and offset ranges
should be visible in the logs, so you should be able to see more
detail about what's happening (e.g. all topics are
Spark 2.2 hasn't been released yet, has it?
Python support in kafka dstreams for 0.10 is probably never, there's a
jira ticket about this.
Stable, hard to say. It was quite a few releases before 0.8 was
marked stable, even though it underwent little change.
On Wed, Jan 18, 2017 at 2:21 AM,
Kafka is designed to only allow reads from leaders. You need to fix
this at the kafka level not the spark level.
On Fri, Jan 6, 2017 at 7:33 AM, Raghu Vadapalli wrote:
>
> My spark 2.0 + kafka 0.8 streaming job fails with error partition leaderset
> exception. When I
You can't change the batch time, but you can limit the number of items
in the batch
http://spark.apache.org/docs/latest/configuration.html
spark.streaming.backpressure.enabled
spark.streaming.kafka.maxRatePerPartition
On Tue, Jan 3, 2017 at 4:00 AM, 周家帅 wrote:
> Hi,
>
> I am
This doesn't sound like a question regarding Kafka streaming, it
sounds like confusion about the scope of variables in spark generally.
Is that right? If so, I'd suggest reading the documentation, starting
with a simple rdd (e.g. using sparkContext.parallelize), and
experimenting to confirm your
Please post a minimal complete code example of what you are talking about
On Thu, Dec 15, 2016 at 6:00 PM, Michael Nguyen
wrote:
> I have the following sequence of Spark Java API calls (Spark 2.0.2):
>
> Kafka stream that is processed via a map function, which returns
You certainly can use stable version of Kafka brokers with spark
2.0.2, why would you think otherwise?
On Mon, Dec 12, 2016 at 8:53 AM, Amir Rahnama wrote:
> Hi,
>
> You need to describe more.
>
> For example, in Spark 2.0.2, you can't use stable versions of Apache Kafka.
http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#creating-a-direct-stream
Use a separate group id for each stream, like the docs say.
If you're doing multiple output operations, and aren't caching, spark
is going to read from kafka again each time, and if some of those
ot;org.apache.spark" %% "spark-core" % spark %
>> "provided",
>> "org.apache.spark" %% "spark-streaming"% spark %
>> "provided",
>> "org.apache.spark" %%
When you say 0.10.1 do you mean broker version only, or does your
assembly contain classes from the 0.10.1 kafka consumer?
On Fri, Dec 9, 2016 at 10:19 AM, debasishg wrote:
> Hello -
>
> I am facing some issues with the following snippet of code that reads from
> Kafka
r.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
> On Wed, Dec 7, 2016 at 12:16 PM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> Personally I thin
Personally I think forcing the stream to fail (e.g. check offsets in
downstream store and throw exception if they aren't as expected) is
the safest thing to do.
If you proceed after a failure, you need a place to reliably record
the batches that failed for later processing.
On Wed, Dec 7, 2016
You do not need recent versions of spark, kafka, or structured
streaming in order to do this. Normal DStreams are sufficient.
You can parallelize your static data from the database to an RDD, and
there's a join method available on RDDs. Transforming a single given
timestamp line into multiple
Have you read / watched the materials linked from
https://github.com/koeninger/kafka-exactly-once
On Mon, Dec 5, 2016 at 4:17 AM, Jörn Franke wrote:
> You need to do the book keeping of what has been processed yourself. This
> may mean roughly the following (of course the
If you want finer-grained max rate setting, SPARK-17510 got merged a
while ago. There's also SPARK-18580 which might help address the
issue of starting backpressure rate for the first batch.
On Mon, Dec 5, 2016 at 4:18 PM, Liren Ding wrote:
> Hey all,
>
> Does
ither for a YARN cluster or spark standalone) we get this issue.
>
> Heji
>
>
> On Sat, Nov 19, 2016 at 8:53 AM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> I ran your example using the versions of kafka and spark you are
>> using, against a standalone clust
So I haven't played around with streaming k means at all, but given
that no one responded to your message a couple of days ago, I'll say
what I can.
1. Can you not sample out some % of the stream for training?
2. Can you run multiple streams at the same time with different values
for k and
There have definitely been issues with UI reporting for the direct
stream in the past, but I'm not able to reproduce this with 2.0.2 and
0.8. See below:
https://i.imgsafe.org/086019ae57.png
On Fri, Nov 18, 2016 at 4:38 AM, Julian Keppel
wrote:
> Hello,
>
> I use
I ran your example using the versions of kafka and spark you are
using, against a standalone cluster. This is what I observed:
(in kafka working directory)
bash-3.2$ ./bin/kafka-run-class.sh kafka.tools.GetOffsetShell
--broker-list 'localhost:9092' --topic simple_logtest --time -2
you mean exactly?
>
> On Fri, Nov 18, 2016 at 1:50 AM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> Ok, I don't think I'm clear on the issue then. Can you say what the
>> expected behavior is, and what the observed behavior is?
>>
>> On Thu,
limit the size of
> batches, it could be any greater size as it does.
>
> Thien
>
> On Fri, Nov 18, 2016 at 1:17 AM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> If you want a consistent limit on the size of batches, use
>> spark.streaming.kafka.maxRatePerP
t.reset=largest, but I
> don't know what I can do in this case.
>
> Do you and other ones could suggest me some solutions please as this seems
> the normal situation with Kafka+SpartStreaming.
>
> Thanks.
> Alex
>
>
>
> On Thu, Nov 17, 2016 at 2:32 AM, Cody Koeninger &
ure implementation in Spark Streaming?
>
> On Wed, Nov 16, 2016 at 4:22 PM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> Moved to user list.
>>
>> I'm not really clear on what you're trying to accomplish (why put the
>> csv file through Kafka instead of read
Moved to user list.
I'm not really clear on what you're trying to accomplish (why put the
csv file through Kafka instead of reading it directly with spark?)
auto.offset.reset=largest just means that when starting the job
without any defined offsets, it will start at the highest (most
recent)
1 - 100 of 652 matches
Mail list logo