You need to set a larger `spark.akka.frameSize`, e.g., 128, for the
serialized weight vector. There is a JIRA about switching
automatically between sending through akka or broadcast:
https://issues.apache.org/jira/browse/SPARK-2361 . -Xiangrui

On Mon, Jul 14, 2014 at 12:15 AM, crater <cq...@ucmerced.edu> wrote:
> Hi,
>
> I encounter an error when testing svm (example one) on very large sparse
> data. The dataset I ran on was a toy dataset with only ten examples but 13
> million sparse vector with a few thousands non-zero entries.
>
> The errors is showing below. I am wondering is this a bug or I am missing
> something?
>
> 14/07/13 23:59:44 INFO SecurityManager: Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 14/07/13 23:59:44 INFO SecurityManager: Changing view acls to: chengjie
> 14/07/13 23:59:44 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(chengjie)
> 14/07/13 23:59:45 INFO Slf4jLogger: Slf4jLogger started
> 14/07/13 23:59:45 INFO Remoting: Starting remoting
> 14/07/13 23:59:45 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://spark@master:53173]
> 14/07/13 23:59:45 INFO Remoting: Remoting now listens on addresses:
> [akka.tcp://spark@master:53173]
> 14/07/13 23:59:45 INFO SparkEnv: Registering MapOutputTracker
> 14/07/13 23:59:45 INFO SparkEnv: Registering BlockManagerMaster
> 14/07/13 23:59:45 INFO DiskBlockManager: Created local directory at
> /tmp/spark-local-20140713235945-c78f
> 14/07/13 23:59:45 INFO MemoryStore: MemoryStore started with capacity 14.4
> GB.
> 14/07/13 23:59:45 INFO ConnectionManager: Bound socket to port 37674 with id
> = ConnectionManagerId(master,37674)
> 14/07/13 23:59:45 INFO BlockManagerMaster: Trying to register BlockManager
> 14/07/13 23:59:45 INFO BlockManagerInfo: Registering block manager
> master:37674 with 14.4 GB RAM
> 14/07/13 23:59:45 INFO BlockManagerMaster: Registered BlockManager
> 14/07/13 23:59:45 INFO HttpServer: Starting HTTP Server
> 14/07/13 23:59:45 INFO HttpBroadcast: Broadcast server started at
> http://10.10.255.128:41838
> 14/07/13 23:59:45 INFO HttpFileServer: HTTP File server directory is
> /tmp/spark-ac459d4b-a3c4-4577-bad4-576ac427d0bf
> 14/07/13 23:59:45 INFO HttpServer: Starting HTTP Server
> 14/07/13 23:59:51 INFO SparkUI: Started SparkUI at http://master:4040
> 14/07/13 23:59:51 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/07/13 23:59:52 INFO EventLoggingListener: Logging events to
> /tmp/spark-events/binaryclassification-with-params(hdfs---master-9001-splice.small,1,1.0,svm,l1,0.1)-1405317591776
> 14/07/13 23:59:52 INFO SparkContext: Added JAR
> file:/home/chengjie/spark-1.0.1/examples/target/scala-2.10/spark-examples-1.0.1-hadoop2.3.0.jar
> at http://10.10.255.128:54689/jars/spark-examples-1.0.1-hadoop2.3.0.jar with
> timestamp 1405317592653
> 14/07/13 23:59:52 INFO AppClient$ClientActor: Connecting to master
> spark://master:7077...
> 14/07/14 00:00:08 WARN TaskSchedulerImpl: Initial job has not accepted any
> resources; check your cluster UI to ensure that workers are registered and
> have sufficient memory
> 14/07/14 00:00:23 WARN TaskSchedulerImpl: Initial job has not accepted any
> resources; check your cluster UI to ensure that workers are registered and
> have sufficient memory
> 14/07/14 00:00:38 WARN TaskSchedulerImpl: Initial job has not accepted any
> resources; check your cluster UI to ensure that workers are registered and
> have sufficient memory
> 14/07/14 00:00:53 WARN TaskSchedulerImpl: Initial job has not accepted any
> resources; check your cluster UI to ensure that workers are registered and
> have sufficient memory
> Training: 10
> 14/07/14 00:01:09 WARN BLAS: Failed to load implementation from:
> com.github.fommil.netlib.NativeSystemBLAS
> 14/07/14 00:01:09 WARN BLAS: Failed to load implementation from:
> com.github.fommil.netlib.NativeRefBLAS
> *Exception in thread "main" org.apache.spark.SparkException: Job aborted due
> to stage failure: Serialized task 20:0 was 94453098 bytes which exceeds
> spark.akka.frameSize (10485760 bytes). Consider using broadcast variables
> for large values.*
>         at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>         at scala.Option.foreach(Option.scala:236)
>         at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-testing-with-large-sparse-svm-tp9592.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to