Using your own partitioner didn't work?

e.g.
YourRDD.partitionBy(new HashPartitioner(your number))

xj @ Tokyo

On Wed, Sep 10, 2014 at 12:03 PM, qihong <qc...@pivotal.io> wrote:

> I'm working on a DStream application.  The input are sensors' measurements,
> the data format is <sensor id><timestamp><measure>
>
> There are 10 thousands sensors, and updateStateByKey is used to maintain
> the states of sensors, the code looks like following:
>
> val inputDStream = ...
> val keyedDStream = inputDStream.map(...)  // use sensorId as key
> val stateDStream = keyedDStream.updateStateByKey[...](udpateFunction)
>
> Here's the question:
> In a cluster with 10 worker nodes, is it possible to partition the input
> dstream, so that node 1 handles sendor 0-999, node 2 handles 1000-1999,
> and so on?
>
> Also, is it possible to keep state stream for sensor 0 - 999 on node 1,
> 1000
> to 1999 on node 2, and etc. Right now, I see sensor state stream is
> shuffled
> for every batch, which used lot of network bandwidth and it's unnecessary.
>
> Any suggestions?
>
> Thanks!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/how-to-setup-steady-state-stream-partitions-tp13850.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to