Hi everyone,

     I got a trouble these days,and I don't know whether it is a bug of
spark.When I use  GroupByKey for our sequenceFile Data,I find that different
partition number lead different result, so as ReduceByKey. I think the
problem happens on the shuffle stage.I read the source code,  but still
can't find the answer.


this is the main code:

val rdd = sc.sequenceFile[UserWritable, TagsWritable](input,
classOf[UserWritable], classOf[TagsWritable])
val combinedRdd = rdd.map(s => (s._1.getuserid(),
s._2)).groupByKey(num).filter(_._1 == uid)

num is the number of partition and uid is a filter id for result
comparision.
TagsWritable implements WritableComparable<TagsWritable> and Serializable.

I used GroupByKey on text file, the result was right. 

Thanks,
Devin Huang




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Different-partition-number-of-GroupByKey-leads-different-result-tp24989.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to