Re: DataStreamer as a Service

Ilya Kasnacheev Tue, 04 Feb 2020 02:15:01 -0800

Hello!

In case of long-lived, low-intensity streaming, Data Streamer will not be
able to utilize its client-side per-partition batching capabilities,
instead being just a wrapper over cache update operations, which are
available as part of Cache API.


Regards,
-- 
Ilya Kasnacheev


вт, 4 февр. 2020 г. в 03:41, Denis Magda <dma...@apache.org>:

> Ilya,
>
> I don't quite understand why data streamer is not suitable as a
> long-running solution. Please don't mislead, otherwise, list out specific
> limitations. I don't see anything wrong by having an opened data
> streamer that transfer data to Ignite in real-time.
>
> Narges, if the streamer crashes then your service/app needs to resend
> those records that were not acknowledged. Probably, you might utilize Kafka
> Connect here that keeps track of committed/pending records.
>
> -
> Denis
>
>
> On Mon, Feb 3, 2020 at 6:13 AM Ilya Kasnacheev <ilya.kasnach...@gmail.com>
> wrote:
>
>> Hello!
>>
>> I think these benefits are imaginary. You will have to worry about
>> service more, rather about data streamer which may be recreated at any time.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пн, 3 февр. 2020 г. в 16:58, narges saleh <snarges...@gmail.com>:
>>
>>> Thanks Ilya.
>>>  I have to listen to these burst of data which arrive every few seconds
>>> meaning an almost constant bursts of data from different data sources.
>>> The main reason that the services grid is appealing to me is its
>>> resiliency; I don't have to worry about it. With the client side streamer,
>>> I will have to deploy it and keep it up running, and load/re balance it.
>>>
>>> On Mon, Feb 3, 2020 at 7:17 AM Ilya Kasnacheev <
>>> ilya.kasnach...@gmail.com> wrote:
>>>
>>>> Hello!
>>>>
>>>> I don't see why you would deploy it as a service, sounds like you will
>>>> have to send more data over network. If you have to pull batches in, then
>>>> service should work. I recommend re-acquiring data streamer for each batch.
>>>>
>>>> Please note that Data Streamer is very scalable, so it is preferred to
>>>> tune it than trying to use more than one streamer.
>>>>
>>>> Regards,
>>>> --
>>>> Ilya Kasnacheev
>>>>
>>>>
>>>> пн, 3 февр. 2020 г. в 16:11, narges saleh <snarges...@gmail.com>:
>>>>
>>>>> Hi Ilya
>>>>> The data comes in huge batches of records (each burst can be up to
>>>>> 50-100 MB, which I plan to spread across multiple streamers) so, the
>>>>> streamer seems to be the way to go. Also, I don't want to establish a JDBC
>>>>> connection each time.
>>>>> So, if the streamer is the way to go, is it feasible to deploy it as a
>>>>> service?
>>>>> thanks.
>>>>>
>>>>> On Mon, Feb 3, 2020 at 6:51 AM Ilya Kasnacheev <
>>>>> ilya.kasnach...@gmail.com> wrote:
>>>>>
>>>>>> Hello!
>>>>>>
>>>>>> Contrary to its name, data streamer is not actually suitable for
>>>>>> long-lived, low-intensity streaming. What it's good for is burst load of
>>>>>> large number of data in a short period of time.
>>>>>>
>>>>>> If your data arrives in large batches, you can use Data Streamer for
>>>>>> each batch. If not, better use Cache API.
>>>>>>
>>>>>> If you are worried that plain Cache API is slow, but also want
>>>>>> failure resilience, there's catch-22. The only way to make something
>>>>>> resilient is to put it into cache :)
>>>>>>
>>>>>> Regards,
>>>>>> --
>>>>>> Ilya Kasnacheev
>>>>>>
>>>>>>
>>>>>> пн, 3 февр. 2020 г. в 14:34, narges saleh <snarges...@gmail.com>:
>>>>>>
>>>>>>> Hi,
>>>>>>> But services are by definition long lived, right? Here is my layout:
>>>>>>> The data is continuously generated and sent to the streamer services 
>>>>>>> (via
>>>>>>> JDBC connection with set streaming on option), deployed, say, as node
>>>>>>> singleton (actually deployed also as microservices) to load the data 
>>>>>>> into
>>>>>>> the caches. The streamers do flush data based on some timers.
>>>>>>>  If the streamer crashes before the buffer is flushed, the client
>>>>>>> catches the exception and resends the batch. Any issue with this layout?
>>>>>>>
>>>>>>> thanks.
>>>>>>>
>>>>>>> On Mon, Feb 3, 2020 at 5:02 AM Ilya Kasnacheev <
>>>>>>> ilya.kasnach...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello!
>>>>>>>>
>>>>>>>> It is not recommended to have long-lived data streamers, it's best
>>>>>>>> to acquire it when it is needed.
>>>>>>>>
>>>>>>>> If you have to keep data streamer around, don't forget to flush()
>>>>>>>> it. This way you don't have to worry about its queue.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> --
>>>>>>>> Ilya Kasnacheev
>>>>>>>>
>>>>>>>>
>>>>>>>> пн, 3 февр. 2020 г. в 13:24, narges saleh <snarges...@gmail.com>:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>> My specific question/concern is with regard to the state of the
>>>>>>>>> streamer when it run as a service, i.e. when it crashes and it gets
>>>>>>>>> redeployed. Specifically, what happens to the data?
>>>>>>>>> I have a similar question with regard to the state of a continuous
>>>>>>>>> query when it is deployed as a service, what happens to the data in 
>>>>>>>>> the
>>>>>>>>> listener's queue?
>>>>>>>>>
>>>>>>>>> thanks.
>>>>>>>>>
>>>>>>>>> On Sun, Feb 2, 2020 at 4:18 PM Mikael <mikael-arons...@telia.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi!
>>>>>>>>>>
>>>>>>>>>> Not as far as I know, I have a number of services using streamers
>>>>>>>>>> without any problems, do you have any specific problem with it ?
>>>>>>>>>>
>>>>>>>>>> Mikael
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Den 2020-02-02 kl. 22:33, skrev narges saleh:
>>>>>>>>>> > Hi All,
>>>>>>>>>> >
>>>>>>>>>> > Is there a problem with running the datastreamer as a service,
>>>>>>>>>> being
>>>>>>>>>> > instantiated in init method? Or loading the data via JDBC
>>>>>>>>>> connection
>>>>>>>>>> > with streaming mode enabled?
>>>>>>>>>> > In either case, the deployment is affinity based.
>>>>>>>>>> >
>>>>>>>>>> > thanks.
>>>>>>>>>>
>>>>>>>>>

Re: DataStreamer as a Service

Reply via email to