Re: reduceByKey vs countByKey

Jey Kottalam Tue, 24 Feb 2015 16:30:06 -0800

Hi Sathish,

The current implementation of countByKey uses reduceByKey:
https://github.com/apache/spark/blob/v1.2.1/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L332


It seems that countByKey is mostly deprecated:
https://issues.apache.org/jira/browse/SPARK-3994

-Jey

On Tue, Feb 24, 2015 at 3:53 PM, Sathish Kumaran Vairavelu
<vsathishkuma...@gmail.com> wrote:
> Hello,
>
> Quick question. I am trying to understand difference between reduceByKey vs
> countByKey? Which one gives better performance reduceByKey or countByKey?
> While we can perform same count operation using reduceByKey why we need
> countByKey/countByValue?
>
> Thanks
>
> Sathish

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: reduceByKey vs countByKey

Reply via email to