Re: Architecture recommendations for a tricky use case

2016-09-29 Thread Michael Segel
Ok… so what’s the tricky part? Spark Streaming isn’t real time so if you don’t mind a slight delay in processing… it would work. The drawback is that you now have a long running Spark Job (assuming under YARN) and that could become a problem in terms of security and resources. (How well does Y

Re: Architecture recommendations for a tricky use case

2016-09-29 Thread Michael Segel
of sync, leading to lost / > duplicate data. > > Regarding long running spark jobs, I have streaming jobs in the > standalone manager that have been running for 6 months or more. > > On Thu, Sep 29, 2016 at 11:01 AM, Michael Segel > wrote: >> Ok… so what’s the tricky part?

Re: Architecture recommendations for a tricky use case

2016-09-29 Thread Michael Segel
pointless. > > On Thu, Sep 29, 2016 at 1:27 PM, Michael Segel > wrote: >> Spark standalone is not Yarn… or secure for that matter… ;-) >> >>> On Sep 29, 2016, at 11:18 AM, Cody Koeninger wrote: >>> >>> Spark streaming helps with aggregation beca