Ok… so what’s the tricky part?
Spark Streaming isn’t real time so if you don’t mind a slight delay in
processing… it would work.
The drawback is that you now have a long running Spark Job (assuming under
YARN) and that could become a problem in terms of security and resources.
(How well does Y
of sync, leading to lost /
> duplicate data.
>
> Regarding long running spark jobs, I have streaming jobs in the
> standalone manager that have been running for 6 months or more.
>
> On Thu, Sep 29, 2016 at 11:01 AM, Michael Segel
> wrote:
>> Ok… so what’s the tricky part?
pointless.
>
> On Thu, Sep 29, 2016 at 1:27 PM, Michael Segel
> wrote:
>> Spark standalone is not Yarn… or secure for that matter… ;-)
>>
>>> On Sep 29, 2016, at 11:18 AM, Cody Koeninger wrote:
>>>
>>> Spark streaming helps with aggregation beca