any chance your input RDD is being read from hdfs, and you are running into this issue (in the docs on SparkContext#hadoopFile):
* '''Note:''' Because Hadoop's RecordReader class re-uses the same Writable object for each * record, directly caching the returned RDD or directly passing it to an aggregation or shuffle * operation will create many references to the same object. * If you plan to directly cache, sort, or aggregate Hadoop writable objects, you should first * copy them using a `map` function. On Thu, Feb 26, 2015 at 10:38 AM, mrk91 <marcogaid...@gmail.com> wrote: > Hello, > > I have an issue with the cartesian method. When I use it with the Java > types everything is ok, but when I use it with RDD made of objects defined > by me it has very strage behaviors which depends on whether the RDD is > cached or not (you can see here > <http://stackoverflow.com/questions/28727823/creating-a-matrix-of-neighbors-with-spark-cartesian-issue> > what happens). > > Is this due to a bug in its implementation or are there any requirements > for the objects to be passed to it? > Thanks. > Best regards. > Marco > ------------------------------ > View this message in context: Cartesian issue with user defined objects > <http://apache-spark-user-list.1001560.n3.nabble.com/Cartesian-issue-with-user-defined-objects-tp21826.html> > Sent from the Apache Spark User List mailing list archive > <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. >