Hi everybody, I think I could use some help with the /updateStateByKey()/ JAVA method in Spark Streaming.
*Context:* I have a /JavaReceiverInputDStream<DataUpdate> du/ DStream, where object /DataUpdate/ mainly has 2 fields of interest (in my case), namely du.personId (an Integer) and du.cell.hashCode() (Integer, again). Obviously, I am processing several /DataUpdate/ objects (coming from a log file read in microbatches), and every /personId/ will be 'associated' to several /du.cell.hashCode()/s. What I need to do is, for every /personId/ statefully counting how many times it appears with a particular /du.cell.hashCode()/, possibly partitioning by the /personId/ key. (Long story short: an area is split in cells and I wonder how many times every person appears in every cell ) In a very naive way, I guess everything should look like a /HashMap<personId, HashMap<cell.hashCode(), count>/, but I am not quite sure how to partition by /personId/ and increase the count. It looks like method /updateStateByKey()/ should do the trick (I am new to Spark Streaming), yet I can't figure out in which way. Any suggestions? Feel free to ask anything in case I was unclear or more information is needed. :) Thank you! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Help-with-updateStateByKey-tp22637.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org