gregation? If that is the
> > problem, then how to force it to do aggregation after receiving each
> > portion of data from Workers?
> > >
> > > Best regards, Alexander
> > >
> > > -Original Message-
> > > From: DB Tsai [mailto:dbt...@dbts
m Workers?
> >
> > Best regards, Alexander
> >
> > -Original Message-
> > From: DB Tsai [mailto:dbt...@dbtsai.com]
> > Sent: Friday, January 23, 2015 11:53 AM
> > To: Ulanov, Alexander
> > Cc: dev@spark.apache.org
> > Subject: Re: Maximum size of vect
then how to force it to do aggregation after receiving each portion
> of data from Workers?
>
> Best regards, Alexander
>
> -Original Message-
> From: DB Tsai [mailto:dbt...@dbtsai.com]
> Sent: Friday, January 23, 2015 11:53 AM
> To: Ulanov, Alexander
> Cc: dev@sp
53 AM
To: Ulanov, Alexander
Cc: dev@spark.apache.org
Subject: Re: Maximum size of vector that reduce can handle
Hi Alexander,
When you use `reduce` to aggregate the vectors, those will actually be pulled
into driver, and merged over there. Obviously, it's not scaleable given you are
doing
I run shell with --executor-memory 8G --driver-memory 8G, so handling 60M
> vector of Double should not be a problem. Are there any big overheads for
> this? What is the maximum size of vector that reduce can handle?
>
> Best regards, Alexander
>
> P.S.
>
> "spark.drive
tor-memory 8G --driver-memory 8G, so handling 60M
vector of Double should not be a problem. Are there any big overheads for this?
What is the maximum size of vector that reduce can handle?
Best regards, Alexander
P.S.
"spark.driver.maxResultSize 0" needs to set in order to run this