Re: sc.parallelize with defaultParallelism=1

Andy Dang Wed, 30 Sep 2015 10:18:06 -0700

Can't you just load the data from HBase first, and then call sc.parallelize
on your dataset?


-Andy

-------
Regards,
Andy (Nam) Dang

On Wed, Sep 30, 2015 at 12:52 PM, Nicolae Marasoiu <
[email protected]> wrote:

> Hi,
>
>
> When calling sc.parallelize(data,1), is there a preference where to put
> the data? I see 2 possibilities: sending it to a worker node, or keeping it
> on the driver program.
>
>
> I would prefer to keep the data local to the driver. The use case is when
> I need just to load a bit of data from HBase, and then compute over it e.g.
> aggregate, using Spark.
>
>
> Thanks,
>
> Nicu
>

Re: sc.parallelize with defaultParallelism=1

Reply via email to