It's much simpler: rdd.partitions.size

On Sun, Mar 23, 2014 at 9:24 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> Hey there fellow Dukes of Data,
>
> How can I tell how many partitions my RDD is split into?
>
> I'm interested in knowing because, from what I gather, having a good
> number of partitions is good for performance. If I'm looking to understand
> how my pipeline is performing, say for a parallelized write out to HDFS,
> knowing how many partitions an RDD has would be a good thing to check.
>
> Is that correct?
>
> I could not find an obvious method or property to see how my RDD is
> partitioned. Instead, I devised the following thingy:
>
> def f(idx, itr): yield idx
>
> rdd = sc.parallelize([1, 2, 3, 4], 4)
> rdd.mapPartitionsWithIndex(f).count()
>
> Frankly, I'm not sure what I'm doing here, but this seems to give me the
> answer I'm looking for. Derp. :)
>
> So in summary, should I care about how finely my RDDs are partitioned? And
> how would I check on that?
>
> Nick
>
>
> ------------------------------
> View this message in context: How many partitions is my RDD split 
> into?<http://apache-spark-user-list.1001560.n3.nabble.com/How-many-partitions-is-my-RDD-split-into-tp3072.html>
> Sent from the Apache Spark User List mailing list 
> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>

Reply via email to