Oh, I misleading by the following log info, that I thought the broadcast
variable is send back to driver. then the sending result to driver has no
relationship with the broadcast variable, but what it is , since there seem
no data will send back?
*org.apache.spark.executor.Executor - Serialized s
Size calculation is correct, but broadcast happens from the driver to the
workers.
btw, your code is broadcasting 400MB 30 times, which are not being evicted
from the cache fast enough, which, I think, is causing blockManagers to run
out of memory.
On Sun, Jan 12, 2014 at 9:34 PM, lihu wrote:
>
Yes, I just using the code snippet from the broadcast example, and using
the spark-shell run this code.
I thought the broadcast is driver send to the executor, and the executor
will send back, is there some wrong for calculate the broadcast size?
*val MAX_ITER = 30*
*val num = 1*
*var ar
Hi Aureliano,
Look for google compute engine scripts from typesafe repo. They recently
tested Akka Cluster on 2400 nodes from Google Compute Engine. You should be
able to reuse the scripts.
Thanks.
Deb
On Sun, Jan 12, 2014 at 8:00 PM, Aureliano Buendia wrote:
> Hi,
>
> Has anyone worked on a s
broadcast is supposed to send data from the driver to the executors and not
the other direction. can you share the code snippet you are using to
broadcast?
--
Mosharaf Chowdhury
http://www.mosharaf.com/
On Sun, Jan 12, 2014 at 8:52 PM, lihu wrote:
> In my opinion, the spark system is for big d
In my opinion, the spark system is for big data, then 400M seem not big .
I read slides about the broadcast, in my understanding, the executor will
send the broadcast variable back to the driver. each executor own a
complete copy of the broadcast variable.
In my experiment, I have 20 machine, eac
I'm reliably getting a bug in PySpark where jobs with many iterative
calculations on cached data stall out.
Data is a folder of ~40 text files, each with 2 mil rows and 360 entries per
row, total size is ~250GB.
I'm testing with the KMeans analyses included as examples (though I see the
same er
400MB isn't really that big. Broadcast is expected to work with several GB
of data and in even larger clusters (100s of machines).
if you are using the default HttpBroadcast, then akka isn't used to move
the broadcasted data. But block manager can run out of memory if you
repetitively broadcast la
On Mon, Jan 13, 2014 at 4:17 AM, lihu wrote:
> I have occurred the same problem with you .
> I have a node of 20 machines, and I just run the broadcast example, what I
> do is just change the data size in the example, to 400M, this is really a
> small data size.
>
Is 400 MB a really small size f
I have occurred the same problem with you .
I have a node of 20 machines, and I just run the broadcast example, what I
do is just change the data size in the example, to 400M, this is really a
small data size. but I occurred the same problem with you .
*So I wonder maybe the broadcast capacity is w
Hi,
Has anyone worked on a script similar to spark-ec2 for google compute
engine?
Google compute engine claims that they have faster instance start up time,
and that together with by minute charging makes it a desirable choice for
spark.
Ah okay - glad you got it working... it must be due to a corruption
somewhere in sbt's state.
On Sun, Jan 12, 2014 at 2:18 AM, Shing Hing Man wrote:
> There is no error if I do sbt/sbt clean between "sbt compile publish-local"
> and "sbt/sbt assembly". Namely
>
> 1) sbt/sbt clean
> 2) sbt/sbt co
You should launch with "java" and not "scala" to launch. The "scala"
command in newer versions manually adds a specific version of akka to
the classpath which conflicts with the version spark is using. This
causes the error you are seeing. It's discussed in this thread on the
dev list:
http://apac
Hi,
I am using the development version of Spark from
git://github.com/apache/incubator-spark.git
with Scala 2.10.3.
The example GroupByTest runs successfully using :
matmsh@gauss:~/Downloads/spark/github/incubator-spark> bin/run-example
org.apache.spark.examples.GroupByTest local
The script
There is no error if I do sbt/sbt clean between "sbt compile publish-local" and
"sbt/sbt assembly". Namely
1) sbt/sbt clean
2) sbt/sbt compile publish-local
3) sbt/sbt clean
4) SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly
Now I have the spark jars in my local ivy repository and I can run spark
Hi,
Thanks for your reply !
sbt/sbt clean does not help.
I did the following in incubator-spark directory and still get the same error
as before.
1) sbt/sbt clean
2) SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly
3) sbt/sbt compile publish-local
Shing
On Sunday, January 12, 2014 12:32 AM
16 matches
Mail list logo