Re: Spark hangs when i call parallelize + count on a ArrayList having 40k elements

Xiangrui Meng Wed, 23 Apr 2014 09:58:18 -0700

How big is each entry, and how much memory do you have on each
executor? You generated all data on driver and
sc.parallelize(bytesList) will send the entire dataset to a single
executor. You may run into I/O or memory issues. If the entries are
generated, you should create a simple RDD sc.parallelize(0 until 20,
20) and call mapPartitions to generate them in parallel. -Xiangrui


On Wed, Apr 23, 2014 at 9:23 AM, amit karmakar
<amit.codenam...@gmail.com> wrote:
> Spark hangs after i perform the following operations
>
>
> ArrayList<byte[]> bytesList = new ArrayList<byte[]>();
> /*
>    add 40k entries to bytesList
> */
>
> JavaRDD<byte[]> rdd = sparkContext.parallelize(bytesList);
>  System.out.println("Count=" + rdd.count());
>
>
> If i add just one entry it works.
>
> It works if i modify,
> JavaRDD<byte[]> rdd = sparkContext.parallelize(bytesList)
> to
> JavaRDD<byte[]> rdd = sparkContext.parallelize(bytesList, 20);
>
> There is nothing in the logs that can help understand the reason.
>
> What could be reason for this ?
>
>
> Regards,
> Amit Kumar Karmakar

Re: Spark hangs when i call parallelize + count on a ArrayList having 40k elements

Reply via email to