How big is each entry, and how much memory do you have on each executor? You generated all data on driver and sc.parallelize(bytesList) will send the entire dataset to a single executor. You may run into I/O or memory issues. If the entries are generated, you should create a simple RDD sc.parallelize(0 until 20, 20) and call mapPartitions to generate them in parallel. -Xiangrui
On Wed, Apr 23, 2014 at 9:23 AM, amit karmakar <amit.codenam...@gmail.com> wrote: > Spark hangs after i perform the following operations > > > ArrayList<byte[]> bytesList = new ArrayList<byte[]>(); > /* > add 40k entries to bytesList > */ > > JavaRDD<byte[]> rdd = sparkContext.parallelize(bytesList); > System.out.println("Count=" + rdd.count()); > > > If i add just one entry it works. > > It works if i modify, > JavaRDD<byte[]> rdd = sparkContext.parallelize(bytesList) > to > JavaRDD<byte[]> rdd = sparkContext.parallelize(bytesList, 20); > > There is nothing in the logs that can help understand the reason. > > What could be reason for this ? > > > Regards, > Amit Kumar Karmakar