The coming data on Spark Streaming

2016-09-14 Thread pcandido
Hi everyone, I'm starting in Spark Streaming and would like to know somethings about data arriving. I know that SS uses micro-batches and they are received by workers and sent to RDD. The master, on defined intervals, receives a poiter to micro-batch in RDD and can use it to process data using ma

Re: The coming data on Spark Streaming

2016-09-21 Thread pcandido
Anybody? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/The-coming-data-on-Spark-Streaming-tp27720p27771.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To

Re: Prototype Implementation of Hierarchical Clustering on Spark

2016-10-03 Thread pcandido
Hello, May you tell me how you applied the MapReduce on Bisecting K-Means? I know how the classical BKM works, but how did you parallelize the processing? All leaf nodes are divided at same time? If no, How? If yes, how do you handle the last nodes? Dividing every leaf node by iteration, you always

Microbatches length

2016-10-20 Thread pcandido
Hello folks, I'm using Spark Streaming. My question is simple: The documentation says that microbatches arrive in intervals. The intervals are in real time (minutes, seconds). I want to get microbatches with same length, so, I can configure SS to return microbatches when it reach a determined leng