On 5 July 2014 23:08, Mayur Rustagi [via Apache Spark User List]
<[email protected]> wrote:
> Key idea is to simulate your app time as you enter data . So you can connect
> spark streaming to a queue and insert data in it spaced by time. Easier said
> than done :).

I see.
I'll try to implement also this solution so that I can compare it with
my current spark implementation.
I'm interested in seeing if this is faster...as I assume it should be :)

> What are the parallelism issues you are hitting with your
> static approach.

In my current spark implementation, whenever I need to get the
aggregated stats over the window, I'm re-mapping all the current bins
to have the same key so that they can be reduced altogether.
This means that data need to shipped to a single reducer.
As results, adding nodes/cores to the application does not really
affect the total time :(

>
>
> On Friday, July 4, 2014, alessandro finamore <[hidden email]> wrote:
>>
>> Thanks for the replies
>>
>> What is not completely clear to me is how time is managed.
>> I can create a DStream from file.
>> But if I set the window property that will be bounded to the application
>> time, right?
>>
>> If I got it right, with a receiver I can control the way DStream are
>> created.
>> But, how can apply then the windowing already shipped with the framework
>> if
>> this is bounded to the "application time"?
>> I would like to do define a window of N files but the window() function
>> requires a duration as input...
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/window-analysis-with-Spark-and-Spark-streaming-tp8806p8824.html
>>
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>
>
> --
> Sent from Gmail Mobile
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/window-analysis-with-Spark-and-Spark-streaming-tp8806p8860.html
> To unsubscribe from window analysis with Spark and Spark streaming, click
> here.
> NAML



-- 
--------------------------------------------------
Alessandro Finamore, PhD
Politecnico di Torino
--
Office:    +39 0115644127
Mobile:   +39 3280251485
SkypeId: alessandro.finamore
---------------------------------------------------




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/window-analysis-with-Spark-and-Spark-streaming-tp8806p8867.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to