Hi,
some time ago I found a problem with backpressure in Spark and prepared a
simple test to check it and compare with Flink.
https://github.com/rssdev10/spark-kafka-streaming
+
https://mail-archives.apache.org/mod_mbox/spark-user/201607.mbox/%3CCA+AWphp=2VsLrgSTWFFknw_KMbq88fZhKfvugoe4YYByEt7
ons and the state.
>
> 2. Why do you assume that this would end up in one partition?
>
> 3. You can also read old messages from a Kafka topic by setting the
> "auto.offset.reset" to "smallest" (or "latest") and using a new "group.id
> ".
>
> I
Hello,
why Flink implements different serialization schemes for keyed and non
keyed messages for Kafka?
I'm using two ways of loading of messages to Kafka. First way is on-fly
loading without Flink by Kafka's means only. In this case I'm using
something like:
props.put("partitioner.class", K
Hello,
I want to implement something like a schema of processing which is
presented on following diagram. This is calculation of number of unique
users per specified time with assumption that we have > 100k events per
second and > 100M unique users:
I have one Kafka's topic of events with a
ding window (your RT-buffer) that you can
> query using a secondary stream (your user's queries).
> > He'll post the code soon to this email thread.
> >
> > Regards,
> > Robert
> >
> >
> > On Wed, Nov 11, 2015 at 2:51 PM, rss rss wrote:
>
Hello,
regarding the Lambda architecture there is a following book -
https://www.manning.com/books/big-data (Big Data. Principles and best
practices of scalable realtime data systems
Nathan Marz and James Warren).
Regards,
Roman
2015-11-12 4:47 GMT+03:00 Welly Tambunan :
> Hi Stephan,
>
>
>
you can work more in a batch-style, but
> that is quite an extensive change to the core.
>
> Greetings,
> Stephan
>
>
> On Sun, Nov 8, 2015 at 5:15 PM, rss rss wrote:
>
>> Hello,
>>
>>
>>
>> I need to extract a finite subset like a data buffer from
Hello,
I need to extract a finite subset like a data buffer from an infinite data
stream. The best way for me is to obtain a finite stream with data
accumulated for a 1minute before (as example). But I not found any existing
technique to do it.
As a possible ways how to do something near to a