ode expensive? Where are you persisting the
>>>> data after updates? Way I see it by moving to Flink, you get to use
>>>> RocksDB(a key-value store) that makes your lookups faster – probably right
>>>> now you are using a non-indexed store like S3 maybe?
>&
;>> now you are using a non-indexed store like S3 maybe?
>>>>
>>>> So, gain is coming from moving to a better persistence store suited to
>>>> your use-case than from batch->streaming. Myabe consider just going with a
>>>> different data store.
ld only be used if you really want to act on the new
>>> events in real-time. It is generally harder to get a streaming job correct
>>> than a batch one.
>>>
>>>
>>>
>>> 2) If current setup is expensive due to serialization-deserialization
generally harder to get a streaming job correct
>>> than a batch one.
>>>
>>>
>>>
>>> 2) If current setup is expensive due to
>>> serialization-deserialization then that should be fixed by moving to a
>>> faster format (maybe AV
ded. At every incoming event, check
>> the previous state and update/output to kafka or whatever data store you
>> are using.
>>
>>
>>
>>
>>
>> Thanks
>>
>> Ankit
>>
>>
>>
>> *From: *Flavio Pompermaier <pomperma..
er <pomperma...@okkam.it <mailto:pomperma...@okkam.it>>
> Date: Tuesday, May 16, 2017 at 9:31 AM
> To: Kostas Kloudas <k.klou...@data-artisans.com
> <mailto:k.klou...@data-artisans.com>>
> Cc: user <user@flink.apache.org <mailto:user@flink.apache.org>>
&g
kit
>
>
>
> *From: *Flavio Pompermaier <pomperma...@okkam.it>
> *Date: *Tuesday, May 16, 2017 at 9:31 AM
> *To: *Kostas Kloudas <k.klou...@data-artisans.com>
> *Cc: *user <user@flink.apache.org>
> *Subject: *Re: Stateful streaming question
>
>
&
ate/output to kafka or whatever data store you are using.
Thanks
Ankit
From: Flavio Pompermaier <pomperma...@okkam.it>
Date: Tuesday, May 16, 2017 at 9:31 AM
To: Kostas Kloudas <k.klou...@data-artisans.com>
Cc: user <user@flink.apache.org>
Subject: Re: Stateful streaming questi
Hi Kostas,
thanks for your quick response.
I also thought about using Async IO, I just need to figure out how to
correctly handle parallelism and number of async requests.
However that's probably the way to go..is it possible also to set a number
of retry attempts/backoff when the async request
Hi Flavio,
From what I understand, for the first part you are correct. You can use Flink’s
internal state to keep your enriched data.
In fact, if you are also querying an external system to enrich your data, it is
worth looking at the AsyncIO feature:
Hi to all,
we're still playing with Flink streaming part in order to see whether it
can improve our current batch pipeline.
At the moment, we have a job that translate incoming data (as Row) into
Tuple4, groups them together by the first field and persist the result to
disk (using a thrift
11 matches
Mail list logo