[Spark Streaming] Help with updateStateByKey()

allonsy Thu, 23 Apr 2015 10:15:30 -0700

Hi everybody,

I think I could use some help with the /updateStateByKey()/ JAVA method in
Spark Streaming.


*Context:*

I have a /JavaReceiverInputDStream<DataUpdate> du/ DStream, where object
/DataUpdate/ mainly has 2 fields of interest (in my case), namely
du.personId (an Integer) and du.cell.hashCode() (Integer, again). Obviously,
I am processing several /DataUpdate/ objects (coming from a log file read in
microbatches), and every /personId/ will be 'associated' to several
/du.cell.hashCode()/s.

What I need to do is, for every /personId/ statefully counting how many
times it appears with a particular /du.cell.hashCode()/, possibly
partitioning by the /personId/ key.

(Long story short: an area is split in cells and I wonder how many times
every person appears in every cell  )

In a very naive way, I guess everything should look like a
/HashMap<personId, HashMap&lt;cell.hashCode(), count>/, but I am not quite
sure how to partition by /personId/ and increase the count. 

It looks like method /updateStateByKey()/ should do the trick (I am new to
Spark Streaming), yet I can't figure out in which way.

Any suggestions?

Feel free to ask anything in case I was unclear or more information is
needed. :)


Thank you!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Help-with-updateStateByKey-tp22637.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

[Spark Streaming] Help with updateStateByKey()

Reply via email to