HiI'm using Apache Spark 1.1.0 and I'm currently having issue with broadcast
method. So when I call broadcast function on a small dataset to a 5 nodes
cluster, I experiencing the "Error sending message as driverActor is null"
after broadcast the variables several times (apps running under jboss)
running under jboss).
Any help would be appreciate.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-broadcast-error-Error-sending-message-as-driverActor-is-null-message-UpdateBlockInfo-Bld-tp21320.html
Sent from the Apache Spark User List mai
spark, can anyone help me?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-broadcast-error-tp19643.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
001560.n3.nabble.com/Multiple-Applications-Spark-Contexts-Concurrently-Fail-With-Broadcast-Error-tp18374.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Cool.. While let me try that.. any other suggestion(s) on things I can try?
On Mon, Sep 15, 2014 at 9:59 AM, Davies Liu wrote:
> I think the 1.1 will be really helpful for you, it's all compatitble
> with 1.0, so it's
> not hard to upgrade to 1.1.
>
> On Mon, Sep 15, 2014 at 2:35 AM, Chengi Liu
I think the 1.1 will be really helpful for you, it's all compatitble
with 1.0, so it's
not hard to upgrade to 1.1.
On Mon, Sep 15, 2014 at 2:35 AM, Chengi Liu wrote:
> So.. same result with parallelize (matrix,1000)
> with broadcast.. seems like I got jvm core dump :-/
> 4/09/15 02:31:22 INFO Blo
So.. same result with parallelize (matrix,1000)
with broadcast.. seems like I got jvm core dump :-/
4/09/15 02:31:22 INFO BlockManagerInfo: Registering block manager
host:47978 with 19.2 GB RAM
14/09/15 02:31:22 INFO BlockManagerInfo: Registering block manager
host:43360 with 19.2 GB RAM
Unhandled
Try:
rdd = sc.broadcast(matrix)
Or
rdd = sc.parallelize(matrix,100) // Just increase the number of slices,
give it a try.
Thanks
Best Regards
On Mon, Sep 15, 2014 at 2:18 PM, Chengi Liu wrote:
> Hi Akhil,
> So with your config (specifically with set("spark.akka.frameSize ",
> "1000")
Hi Akhil,
So with your config (specifically with set("spark.akka.frameSize ",
"1000")) , I see the error:
org.apache.spark.SparkException: Job aborted due to stage failure:
Serialized task 0:0 was 401970046 bytes which exceeds spark.akka.frameSize
(10485760 bytes). Consider using broadcast va
Can you give this a try:
conf = SparkConf().set("spark.executor.memory",
"32G")*.set("spark.akka.frameSize
> ",
> "1000").set("spark.broadcast.factory","org.apache.spark.broadcast.TorrentBroadcastFactory")*
> sc = SparkContext(conf = conf)
> rdd = sc.parallelize(matrix,5)
> from pyspark.mllib.
And the thing is code runs just fine if I reduce the number of rows in my
data?
On Sun, Sep 14, 2014 at 8:45 PM, Chengi Liu wrote:
> I am using spark1.0.2.
> This is my work cluster.. so I can't setup a new version readily...
> But right now, I am not using broadcast ..
>
>
> conf = SparkConf().
I am using spark1.0.2.
This is my work cluster.. so I can't setup a new version readily...
But right now, I am not using broadcast ..
conf = SparkConf().set("spark.executor.memory",
"32G").set("spark.akka.frameSize", "1000")
sc = SparkContext(conf = conf)
rdd = sc.parallelize(matrix,5)
from pysp
Hey Chengi,
What's the version of Spark you are using? It have big improvements
about broadcast in 1.1, could you try it?
On Sun, Sep 14, 2014 at 8:29 PM, Chengi Liu wrote:
> Any suggestions.. I am really blocked on this one
>
> On Sun, Sep 14, 2014 at 2:43 PM, Chengi Liu wrote:
>>
>> And when
Any suggestions.. I am really blocked on this one
On Sun, Sep 14, 2014 at 2:43 PM, Chengi Liu wrote:
> And when I use sparksubmit script, I get the following error:
>
> py4j.protocol.Py4JJavaError: An error occurred while calling
> o26.trainKMeansModel.
> : org.apache.spark.SparkException: Job a
And when I use sparksubmit script, I get the following error:
py4j.protocol.Py4JJavaError: An error occurred while calling
o26.trainKMeansModel.
: org.apache.spark.SparkException: Job aborted due to stage failure: All
masters are unresponsive! Giving up.
at org.apache.spark.scheduler.DAGScheduler.
How? Example please..
Also, if I am running this in pyspark shell.. how do i configure
spark.akka.frameSize ??
On Sun, Sep 14, 2014 at 7:43 AM, Akhil Das
wrote:
> When the data size is huge, you better of use the torrentBroadcastFactory.
>
> Thanks
> Best Regards
>
> On Sun, Sep 14, 2014 at 2:5
When the data size is huge, you better of use the torrentBroadcastFactory.
Thanks
Best Regards
On Sun, Sep 14, 2014 at 2:54 PM, Chengi Liu wrote:
> Specifically the error I see when I try to operate on rdd created by
> sc.parallelize method
> : org.apache.spark.SparkException: Job aborted due t
Specifically the error I see when I try to operate on rdd created by
sc.parallelize method
: org.apache.spark.SparkException: Job aborted due to stage failure:
Serialized task 12:12 was 12062263 bytes which exceeds spark.akka.frameSize
(10485760 bytes). Consider using broadcast variables for large
Hi,
I am trying to create an rdd out of large matrix sc.parallelize
suggest to use broadcast
But when I do
sc.broadcast(data)
I get this error:
Traceback (most recent call last):
File "", line 1, in
File "/usr/common/usg/spark/1.0.2/python/pyspark/context.py", line 370,
in broadcast
Hi All,
When I run the program shown below, I receive the error shown below.
I am running the current version of branch-0.9 from github. Note that
I do not receive the error when I replace "2 ** 29" with "2 ** X",
where X < 29. More interestingly, I do not receive the error when X =
30, and when
20 matches
Mail list logo