Just a guess. updateStateByKey has overloaded variants with partitioner as parameter. Can it help?
-----Original Message----- From: qihong [mailto:qc...@pivotal.io] Sent: Tuesday, September 09, 2014 9:13 PM To: u...@spark.incubator.apache.org Subject: Re: how to setup steady state stream partitions Thanks for your response. I do have something like: val inputDStream = ... val keyedDStream = inputDStream.map(...) // use sensorId as key val partitionedDStream = keyedDstream.transform(rdd => rdd.partitionBy(new MyPartitioner(...))) val stateDStream = partitionedDStream.updateStateByKey[...](udpateFunction) The partitionedDStream does have steady partitions, but stateDStream does not have steady partitions, i.e., in the partition 0 of partitionedDStream, there's only data for sensors 0 to 999, but the partition 0 of stateDStream contains data for some sensors from 0 to 999 range, and lot of sensor from other partitions of partitionedDStream. I wish the partition 0 of stateDStream only contains the data from the partition 0 of partitionedDStream, partiton 1 of stateDStream only from partition 1 of partitionedDStream, and so on. Anyone knows how to implement that? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-setup-steady-state-stream-partitions-tp13850p13853.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org