Did you stop the 1.6g job or did it fail? I see task failures but no stage failures.
On Oct 10, 2014, at 8:49 AM, pol <swallow_p...@163.com> wrote: Hi Pat, Yes, spark-itemsimilarity can be work ok, it had been finished calculation on 150m dataset. The problem above, 1.6g dataset can’t be finishing calculation, I have three machines(16 cores and 16g memory per) for this test, the environment can't finish the calculation? The dataset had archived one file by hadoop archive tool, such as only a machine at processing state. To do so because don’t archive will be coming some error, about information can refer to the attachment. <spark1.png> <spark2.png> <spark3.png> If you can, I will provide the test dataset to you. Thank you again. On Oct 10, 2014, at 22:07, Pat Ferrel <p...@occamsmachete.com> wrote: > So it is completing some of the spar-itemsimilarity jobs now? That is better > at least. > > Yes. More data means you may need more memory or more nodes in your cluster. > This is how to scale Spark and Hadoop. Spark in particular needs core memory > since it tries to avoid disk read/write. > > Try increasing -sem as fas as you can first then you may need to add machines > to your cluster tp speed it up. Do you need results faster than 15 hours. > > Remember the way the Solr recommender works allows you to make > recommendations to new users and train less often. The new user data does no > have to be in the training/indicator data. You train partly based on how many > new user but partly based on how many new items are added to the catalog. > > A\On Oct 10, 2014, at 1:47 AM, pol <swallow_p...@163.com> wrote: > > Hi Pat, > Because of a holiday, now just reply. > > I changed 1.0.2 to 1.0.1 for mahout-1.0-SNAPSHOT, and use Spark 1.0.1 , > Hadoop 2.4.0, spark-itemsimilarity can be work ok. But have a new question: > mahout spark-itemsimilarity -i /view_input,/purchase_input -o /output > -os -ma spark://recommend1:7077 -sem 15g -f1 purchase -f2 view -ic 2 -fc 1 -m > 36 > > When "view" data:1.6g and "purchase" data:60m, this shell 15 hours are > not performed("indicator-matrix" had computed, and "cross-indicator-matrix" > computing), but "view" data:100m finished 2 minutes to perform, this is the > reason of data? > > > On Oct 1, 2014, at 01:10, Pat Ferrel <p...@occamsmachete.com> wrote: > >> This will not be fixed in Mahout 1.0 unless we can find a problem in Mahout >> now. I am the one who would fix it. At present it looks to me like a Spark >> version or setup problem. >> >> These errors seem to indicate that the build or setup have a problems. It >> seems that you cannot use Spark 1.10. Set up your cluster to use >> mahout-1.0-SNAPSHOT with pom set to back to spark-1.0.1, Spark 1.0.1 build >> for Hadoop 2.4, and Hadoop 2.4. This is the only combination that is >> supposed to work together. >> >> If this still fails it may be a setup problems since I can run on a cluster >> just fine with my setup. When you get an error from this config send it to >> me and the Spark user list to see if they can give us a clue. >> >> Question: Do you have mahout-1.0-SNAPSHOT and spark installed on all your >> cluster machines, with the correct environment variables and path? >> >> >> On Sep 30, 2014, at 12:47 AM, pol <swallow_p...@163.com> wrote: >> >> Hi Pat, >> It’s problem for Spark version, but spark-itemsimilarity is still can't >> the completion of normal. >> >> 1. Change 1.0.1 to 1.1.0 at mahout-1.0-SNAPSHOT/pom.xml, Spark version >> compatibility is no problem, but the program has a problem: >> -------------------------------------------------------------- >> 14/09/30 11:26:04 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 10.1 >> (TID 31, Hadoop.Slave1): java.lang.NoClassDefFoundError: >> org/apache/commons/math3/random/RandomGenerator >> org.apache.mahout.common.RandomUtils.getRandom(RandomUtils.java:65) >> >> org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:228) >> >> org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:223) >> >> org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$1.apply(MapBlock.scala:33) >> >> org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$1.apply(MapBlock.scala:32) >> scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) >> >> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:235) >> >> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163) >> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70) >> org.apache.spark.rdd.RDD.iterator(RDD.scala:227) >> org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) >> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >> org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >> >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >> org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) >> org.apache.spark.scheduler.Task.run(Task.scala:54) >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) >> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> java.lang.Thread.run(Thread.java:662) >> -------------------------------------------------------------- >> I tried to add commons-math3-3.2.jar to mahout-1.0-SNAPSHOT/lib, but still >> the same. (It not directly use the RandomGenerator at RandomUtils.java:65) >> >> >> 2. Change 1.0.1 to 1.0.2 at mahout-1.0-SNAPSHOT/pom.xml, there are still >> other errors: >> -------------------------------------------------------------- >> 14/09/30 14:36:57 WARN scheduler.TaskSetManager: Lost TID 427 (task 7.0:51) >> 14/09/30 14:36:57 WARN scheduler.TaskSetManager: Loss was due to >> java.lang.ClassCastException >> java.lang.ClassCastException: scala.Tuple1 cannot be cast to scala.Tuple2 >> at >> org.apache.mahout.drivers.TDIndexedDatasetReader$$anonfun$4.apply(TextDelimitedReaderWriter.scala:75) >> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> at >> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:59) >> at >> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96) >> at >> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95) >> at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594) >> at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) >> at org.apache.spark.scheduler.Task.run(Task.scala:51) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> at java.lang.Thread.run(Thread.java:662) >> -------------------------------------------------------------- >> Please refer to the attachment for full log. >> <screenlog_bash.log> >> >> >> >> In addition, I used 66 files on HDFS than each file in 20 to 30 M, if it is >> necessary I will provide the data. >> Shell is : mahout spark-itemsimilarity -i >> /rec/input/ss/others,/rec/input/ss/weblog -o /rec/output/ss -os -ma >> spark://recommend1:7077 -sem 4g -f1 purchase -f2 view -ic 2 -fc 1 >> Spark cluster: 8 workers, 32 cores total, 32G memory total, at two machines. >> >> Feeling a few days are not solved, not as good as waiting for Mahout 1.0 >> release version or use mahout item similarity. >> >> >> Thank you again, Pat. >> >> >> On Sep 29, 2014, at 00:02, Pat Ferrel <p...@occamsmachete.com> wrote: >> >>> It looks like the cluster version of spark-itemsimilarity is never accepted >>> by the Spark master. it fails in TextDelimitedReaderWriter.scala because >>> all work is using “lazy” evaluation and until the write no actual work is >>> done on the Spark cluster. >>> >>> However your cluster seems to be working with the Pi example. Therefore >>> there must be something wrong with the Mahout build or config. Some ideas: >>> >>> 1) Mahout 1.0-SNAPSHOT is targeted for Spark 1.0.1. However I use 1.0.2 >>> and it seems to work. You might try changing the version in the pom.xml and >>> do a clean build of Mahout. Change the version number in mahout/pom.xml >>> >>> mahout/pom.xml >>> - <spark.version>1.0.1</spark.version> >>> + <spark.version>1.1.0</spark.version> >>> >>> This may not be needed but it is easier than installing Spark 1.0.1. >>> >>> 2) Try installing and building Mahout on all cluster machines. I do this so >>> I can run the Mahout spark-shell on any machine but it may be needed. The >>> Mahout jars, path setup, and directory structure should be the same on all >>> cluster machines. >>> >>> 3) Try making -sem larger. I usually make it as large a I can on the >>> cluster and try smaller until it affects performance. The epinions dataset >>> that I use for testing on my cluster requires -sem 6g. >>> >>> My cluster has 3 machines with Hadoop 1.2.1 and Spark 1.0.2. I can try >>> running your data through spark-itemsimilarity on my cluster if you can >>> share it. I will sign an NDA and destroy it after the test. >>> >>> >>> >>> On Sep 27, 2014, at 5:28 AM, pol <swallow_p...@163.com> wrote: >>> >>> Hi Pat, >>> Thank for your’s reply. It's still can't work normal, I tested it on a >>> Spark standalone cluster, don’t tested it on a YARN cluster. >>> >>> First, test the cluster configuration is correct. >>> http:///Hadoop.Master:8080 infos: >>> ----------------------------------- >>> URL: spark://Hadoop.Master:7077 >>> Workers: 2 >>> Cores: 4 Total, 0 Used >>> Memory: 2.0 GB Total, 0.0 B Used >>> Applications: 0 Running, 1 Completed >>> Drivers: 0 Running, 0 Completed >>> Status: ALIVE >>> ---------------------------------- >>> >>> Environment >>> ---------------------------------- >>> OS: CentOS release 6.5 (Final) >>> JDK: 1.6.0_45 >>> Mahout: mahout-1.0-SNAPSHOT(mvn -Dhadoop2.version=2.4.1 -DskipTests clean >>> package) >>> Hadoop: 2.4.1 >>> Spark: spark-1.1.0-bin-2.4.1(mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.1 >>> -Phive -DskipTests clean package) >>> ---------------------------------- >>> >>> Shell: >>> spark-submit --class org.apache.spark.examples.SparkPi --master >>> spark://Hadoop.Master:7077 --executor-memory 1g --total-executor-cores 2 >>> /root/spark-examples_2.10-1.1.0.jar 1000 >>> >>> It’s work ok, a part of the log for the shell: >>> ---------------------------------- >>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 995.0 in >>> stage 0.0 (TID 995) in 17 ms on Hadoop.Slave1 (996/1000) >>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Starting task 998.0 in >>> stage 0.0 (TID 998, Hadoop.Slave2, PROCESS_LOCAL, 1225 bytes) >>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 996.0 in >>> stage 0.0 (TID 996) in 20 ms on Hadoop.Slave2 (997/1000) >>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Starting task 999.0 in >>> stage 0.0 (TID 999, Hadoop.Slave1, PROCESS_LOCAL, 1225 bytes) >>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 997.0 in >>> stage 0.0 (TID 997) in 27 ms on Hadoop.Slave1 (998/1000) >>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 998.0 in >>> stage 0.0 (TID 998) in 31 ms on Hadoop.Slave2 (999/1000) >>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 999.0 in >>> stage 0.0 (TID 999) in 20 ms on Hadoop.Slave1 (1000/1000) >>> 14/09/19 19:48:00 INFO scheduler.DAGScheduler: Stage 0 (reduce at >>> SparkPi.scala:35) finished in 25.109 s >>> 14/09/19 19:48:00 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, >>> whose tasks have all completed, from pool >>> 14/09/19 19:48:00 INFO spark.SparkContext: Job finished: reduce at >>> SparkPi.scala:35, took 26.156022565 s >>> Pi is roughly 3.14156112 >>> ---------------------------------- >>> >>> Second, test spark-itemsimilarity on "local", it's work ok, shell: >>> mahout spark-itemsimilarity -i /test/ss/input/data.txt -o >>> /test/ss/output -os -ma local[2] -sem 512m -f1 purchase -f2 view -ic 2 -fc 1 >>> >>> Third, test spark-itemsimilarity on "cluster", shell: >>> mahout spark-itemsimilarity -i /test/ss/input/data.txt -o >>> /test/ss/output -os -ma spark://Hadoop.Master:7077 -sem 512m -f1 purchase >>> -f2 view -ic 2 -fc 1 >>> >>> It’s can’t work, full logs: >>> ---------------------------------- >>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. >>> SLF4J: Class path contains multiple SLF4J bindings. >>> SLF4J: Found binding in >>> [jar:file:/usr/mahout-1.0-SNAPSHOT/mrlegacy/target/mahout-mrlegacy-1.0-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>> SLF4J: Found binding in >>> [jar:file:/usr/mahout-1.0-SNAPSHOT/spark/target/mahout-spark_2.10-1.0-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>> SLF4J: Found binding in >>> [jar:file:/usr/spark-1.1.0-bin-2.4.1/lib/spark-assembly-1.1.0-hadoop2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >>> explanation. >>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >>> 14/09/19 20:31:07 INFO spark.SecurityManager: Changing view acls to: root >>> 14/09/19 20:31:07 INFO spark.SecurityManager: SecurityManager: >>> authentication disabled; ui acls disabled; users with view permissions: >>> Set(root) >>> 14/09/19 20:31:08 INFO slf4j.Slf4jLogger: Slf4jLogger started >>> 14/09/19 20:31:08 INFO Remoting: Starting remoting >>> 14/09/19 20:31:08 INFO Remoting: Remoting started; listening on addresses >>> :[akka.tcp://spark@Hadoop.Master:47597] >>> 14/09/19 20:31:08 INFO Remoting: Remoting now listens on addresses: >>> [akka.tcp://spark@Hadoop.Master:47597] >>> 14/09/19 20:31:08 INFO spark.SparkEnv: Registering MapOutputTracker >>> 14/09/19 20:31:08 INFO spark.SparkEnv: Registering BlockManagerMaster >>> 14/09/19 20:31:08 INFO storage.DiskBlockManager: Created local directory at >>> /tmp/spark-local-20140919203108-e4e3 >>> 14/09/19 20:31:08 INFO storage.MemoryStore: MemoryStore started with >>> capacity 2.3 GB. >>> 14/09/19 20:31:08 INFO network.ConnectionManager: Bound socket to port >>> 47186 with id = ConnectionManagerId(Hadoop.Master,47186) >>> 14/09/19 20:31:08 INFO storage.BlockManagerMaster: Trying to register >>> BlockManager >>> 14/09/19 20:31:08 INFO storage.BlockManagerInfo: Registering block manager >>> Hadoop.Master:47186 with 2.3 GB RAM >>> 14/09/19 20:31:08 INFO storage.BlockManagerMaster: Registered BlockManager >>> 14/09/19 20:31:08 INFO spark.HttpServer: Starting HTTP Server >>> 14/09/19 20:31:08 INFO server.Server: jetty-8.y.z-SNAPSHOT >>> 14/09/19 20:31:08 INFO server.AbstractConnector: Started >>> SocketConnector@0.0.0.0:41116 >>> 14/09/19 20:31:08 INFO broadcast.HttpBroadcast: Broadcast server started at >>> http://192.168.204.128:41116 >>> 14/09/19 20:31:08 INFO spark.HttpFileServer: HTTP File server directory is >>> /tmp/spark-10744709-bbeb-4d79-8bfe-d64d77799fb3 >>> 14/09/19 20:31:08 INFO spark.HttpServer: Starting HTTP Server >>> 14/09/19 20:31:08 INFO server.Server: jetty-8.y.z-SNAPSHOT >>> 14/09/19 20:31:08 INFO server.AbstractConnector: Started >>> SocketConnector@0.0.0.0:59137 >>> 14/09/19 20:31:09 INFO server.Server: jetty-8.y.z-SNAPSHOT >>> 14/09/19 20:31:09 INFO server.AbstractConnector: Started >>> SelectChannelConnector@0.0.0.0:4040 >>> 14/09/19 20:31:09 INFO ui.SparkUI: Started SparkUI at >>> http://Hadoop.Master:4040 >>> 14/09/19 20:31:10 WARN util.NativeCodeLoader: Unable to load native-hadoop >>> library for your platform... using builtin-java classes where applicable >>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR >>> /usr/mahout-1.0-SNAPSHOT/math-scala/target/mahout-math-scala_2.10-1.0-SNAPSHOT.jar >>> at >>> http://192.168.204.128:59137/jars/mahout-math-scala_2.10-1.0-SNAPSHOT.jar >>> with timestamp 1411129870562 >>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR >>> /usr/mahout-1.0-SNAPSHOT/mrlegacy/target/mahout-mrlegacy-1.0-SNAPSHOT.jar >>> at http://192.168.204.128:59137/jars/mahout-mrlegacy-1.0-SNAPSHOT.jar with >>> timestamp 1411129870588 >>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR >>> /usr/mahout-1.0-SNAPSHOT/math/target/mahout-math-1.0-SNAPSHOT.jar at >>> http://192.168.204.128:59137/jars/mahout-math-1.0-SNAPSHOT.jar with >>> timestamp 1411129870612 >>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR >>> /usr/mahout-1.0-SNAPSHOT/spark/target/mahout-spark_2.10-1.0-SNAPSHOT.jar at >>> http://192.168.204.128:59137/jars/mahout-spark_2.10-1.0-SNAPSHOT.jar with >>> timestamp 1411129870618 >>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR >>> /usr/mahout-1.0-SNAPSHOT/math-scala/target/mahout-math-scala_2.10-1.0-SNAPSHOT.jar >>> at >>> http://192.168.204.128:59137/jars/mahout-math-scala_2.10-1.0-SNAPSHOT.jar >>> with timestamp 1411129870620 >>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR >>> /usr/mahout-1.0-SNAPSHOT/mrlegacy/target/mahout-mrlegacy-1.0-SNAPSHOT.jar >>> at http://192.168.204.128:59137/jars/mahout-mrlegacy-1.0-SNAPSHOT.jar with >>> timestamp 1411129870631 >>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR >>> /usr/mahout-1.0-SNAPSHOT/math/target/mahout-math-1.0-SNAPSHOT.jar at >>> http://192.168.204.128:59137/jars/mahout-math-1.0-SNAPSHOT.jar with >>> timestamp 1411129870644 >>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR >>> /usr/mahout-1.0-SNAPSHOT/spark/target/mahout-spark_2.10-1.0-SNAPSHOT.jar at >>> http://192.168.204.128:59137/jars/mahout-spark_2.10-1.0-SNAPSHOT.jar with >>> timestamp 1411129870647 >>> 14/09/19 20:31:10 INFO client.AppClient$ClientActor: Connecting to master >>> spark://Hadoop.Master:7077... >>> 14/09/19 20:31:13 INFO storage.MemoryStore: ensureFreeSpace(86126) called >>> with curMem=0, maxMem=2491102003 >>> 14/09/19 20:31:13 INFO storage.MemoryStore: Block broadcast_0 stored as >>> values to memory (estimated size 84.1 KB, free 2.3 GB) >>> 14/09/19 20:31:13 INFO mapred.FileInputFormat: Total input paths to process >>> : 1 >>> 14/09/19 20:31:13 INFO spark.SparkContext: Starting job: collect at >>> TextDelimitedReaderWriter.scala:74 >>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Registering RDD 7 (distinct >>> at TextDelimitedReaderWriter.scala:74) >>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Got job 0 (collect at >>> TextDelimitedReaderWriter.scala:74) with 2 output partitions >>> (allowLocal=false) >>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Final stage: Stage 0(collect >>> at TextDelimitedReaderWriter.scala:74) >>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Parents of final stage: >>> List(Stage 1) >>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Missing parents: List(Stage >>> 1) >>> 14/09/19 20:31:14 INFO scheduler.DAGScheduler: Submitting Stage 1 >>> (MapPartitionsRDD[7] at distinct at TextDelimitedReaderWriter.scala:74), >>> which has no missing parents >>> 14/09/19 20:31:14 INFO scheduler.DAGScheduler: Submitting 2 missing tasks >>> from Stage 1 (MapPartitionsRDD[7] at distinct at >>> TextDelimitedReaderWriter.scala:74) >>> 14/09/19 20:31:14 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 >>> with 2 tasks >>> 14/09/19 20:31:29 WARN scheduler.TaskSchedulerImpl: Initial job has not >>> accepted any resources; check your cluster UI to ensure that workers are >>> registered and have sufficient memory >>> 14/09/19 20:31:30 INFO client.AppClient$ClientActor: Connecting to master >>> spark://Hadoop.Master:7077... >>> 14/09/19 20:31:44 WARN scheduler.TaskSchedulerImpl: Initial job has not >>> accepted any resources; check your cluster UI to ensure that workers are >>> registered and have sufficient memory >>> 14/09/19 20:31:50 INFO client.AppClient$ClientActor: Connecting to master >>> spark://Hadoop.Master:7077... >>> 14/09/19 20:31:59 WARN scheduler.TaskSchedulerImpl: Initial job has not >>> accepted any resources; check your cluster UI to ensure that workers are >>> registered and have sufficient memory >>> 14/09/19 20:32:10 ERROR cluster.SparkDeploySchedulerBackend: Application >>> has been killed. Reason: All masters are unresponsive! Giving up. >>> 14/09/19 20:32:10 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, >>> whose tasks have all completed, from pool >>> 14/09/19 20:32:10 INFO scheduler.TaskSchedulerImpl: Cancelling stage 1 >>> 14/09/19 20:32:10 INFO scheduler.DAGScheduler: Failed to run collect at >>> TextDelimitedReaderWriter.scala:74 >>> Exception in thread "main" org.apache.spark.SparkException: Job aborted due >>> to stage failure: All masters are unresponsive! Giving up. >>> at >>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026) >>> at >>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >>> at >>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634) >>> at scala.Option.foreach(Option.scala:236) >>> at >>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634) >>> at >>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229) >>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) >>> at akka.actor.ActorCell.invoke(ActorCell.scala:456) >>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) >>> at akka.dispatch.Mailbox.run(Mailbox.scala:219) >>> at >>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) >>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>> at >>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>> at >>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped >>> o.e.j.s.ServletContextHandler{/metrics/json,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped >>> o.e.j.s.ServletContextHandler{/stages/stage/kill,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped >>> o.e.j.s.ServletContextHandler{/,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped >>> o.e.j.s.ServletContextHandler{/static,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped >>> o.e.j.s.ServletContextHandler{/executors/json,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped >>> o.e.j.s.ServletContextHandler{/executors,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped >>> o.e.j.s.ServletContextHandler{/environment/json,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped >>> o.e.j.s.ServletContextHandler{/environment,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped >>> o.e.j.s.ServletContextHandler{/storage/rdd/json,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped >>> o.e.j.s.ServletContextHandler{/storage/rdd,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped >>> o.e.j.s.ServletContextHandler{/storage/json,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped >>> o.e.j.s.ServletContextHandler{/storage,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped >>> o.e.j.s.ServletContextHandler{/stages/pool/json,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped >>> o.e.j.s.ServletContextHandler{/stages/pool,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped >>> o.e.j.s.ServletContextHandler{/stages/stage/json,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped >>> o.e.j.s.ServletContextHandler{/stages/stage,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped >>> o.e.j.s.ServletContextHandler{/stages/json,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped >>> o.e.j.s.ServletContextHandler{/stages,null} >>> ---------------------------------- >>> >>> Thanks. >>> >>> >>> >>> On Sep 27, 2014, at 01:05, Pat Ferrel <p...@occamsmachete.com> wrote: >>> >>>> Any luck with this? >>>> >>>> If not could you send a full stack trace and check on the cluster machines >>>> for other logs that might help. >>>> >>>> >>>> On Sep 25, 2014, at 6:34 AM, Pat Ferrel <p...@occamsmachete.com> wrote: >>>> >>>> Looks like a Spark error as far as I can tell. This error is very generic >>>> and indicates that the job was not accepted for execution so Spark may be >>>> configured wrong. This looks like a question for the Spark people >>>> >>>> My Spark sanity check: >>>> >>>> 1) In the Spark UI at http:///Hadoop.Master:8080 does everything look >>>> correct? >>>> 2) Have you tested your spark *cluster* with one of their examples? Have >>>> you run *any non-Mahout* code on the cluster to check that it is >>>> configured properly? >>>> 3) Are you using exactly the same Spark and Hadoop locally as on the >>>> cluster? >>>> 4) Did you launch both local and cluster jobs from the same cluster >>>> machine? The only difference being the master URL (local[2] vs. >>>> spark://Hadoop.Master:7077)? >>>> >>>> 14/09/22 04:12:47 WARN scheduler.TaskSchedulerImpl: Initial job has not >>>> accepted any resources; check your cluster UI to ensure that workers are >>>> registered and have sufficient memory >>>> 14/09/22 04:12:49 INFO client.AppClient$ClientActor: Connecting to master >>>> spark://Hadoop.Master:7077... >>>> >>>> >>>> On Sep 24, 2014, at 8:18 PM, pol <swallow_p...@163.com> wrote: >>>> >>>> Hi, Pat >>>> Dataset is the same, and the data is very few for test. This is a bug? >>>> >>>> >>>> On Sep 25, 2014, at 02:57, Pat Ferrel <pat.fer...@gmail.com> wrote: >>>> >>>>> Are you using different data sets on the local and cluster? >>>>> >>>>> Try increasing spark memory with -sem, I use -sem 6g for the epinions >>>>> data set. >>>>> >>>>> The ID dictionaries are kept in-memory on each cluster machine so a large >>>>> number of user or item IDs will need more memory. >>>>> >>>>> >>>>> On Sep 24, 2014, at 9:31 AM, pol <swallow_p...@163.com> wrote: >>>>> >>>>> Hi, All >>>>> >>>>> I’m sure it’s ok that launching Spark standalone to a cluster, but it >>>>> can’t work used for spark-itemsimilarity. >>>>> >>>>> Launching on 'local' it’s ok: >>>>> mahout spark-itemsimilarity -i /user/root/test/input/data.txt -o >>>>> /user/root/test/output -os -ma local[2] -f1 purchase -f2 view -ic 2 -fc 1 >>>>> -sem 1g >>>>> >>>>> but launching on a standalone cluster will be an error: >>>>> mahout spark-itemsimilarity -i /user/root/test/input/data.txt -o >>>>> /user/root/test/output -os -ma spark://Hadoop.Master:7077 -f1 purchase >>>>> -f2 view -ic 2 -fc 1 -sem 1g >>>>> ------------ >>>>> 14/09/22 04:12:47 WARN scheduler.TaskSchedulerImpl: Initial job has not >>>>> accepted any resources; check your cluster UI to ensure that workers are >>>>> registered and have sufficient memory >>>>> 14/09/22 04:12:49 INFO client.AppClient$ClientActor: Connecting to master >>>>> spark://Hadoop.Master:7077... >>>>> 14/09/22 04:13:02 WARN scheduler.TaskSchedulerImpl: Initial job has not >>>>> accepted any resources; check your cluster UI to ensure that workers are >>>>> registered and have sufficient memory >>>>> 14/09/22 04:13:09 INFO client.AppClient$ClientActor: Connecting to master >>>>> spark://Hadoop.Master:7077... >>>>> 14/09/22 04:13:17 WARN scheduler.TaskSchedulerImpl: Initial job has not >>>>> accepted any resources; check your cluster UI to ensure that workers are >>>>> registered and have sufficient memory >>>>> 14/09/22 04:13:29 ERROR cluster.SparkDeploySchedulerBackend: Application >>>>> has been killed. Reason: All masters are unresponsive! Giving up. >>>>> 14/09/22 04:13:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, >>>>> whose tasks have all completed, from pool >>>>> 14/09/22 04:13:29 INFO scheduler.TaskSchedulerImpl: Cancelling stage 1 >>>>> 14/09/22 04:13:29 INFO scheduler.DAGScheduler: Failed to run collect at >>>>> TextDelimitedReaderWriter.scala:74 >>>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted >>>>> due to stage failure: All masters are unresponsive! Giving up. >>>>> at >>>>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044) >>>>> at >>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028) >>>>> at >>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026) >>>>> at >>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >>>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >>>>> at >>>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026) >>>>> at >>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634) >>>>> at >>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634) >>>>> at scala.Option.foreach(Option.scala:236) >>>>> at >>>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634) >>>>> at >>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229) >>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) >>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:456) >>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) >>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:219) >>>>> at >>>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) >>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>>> at >>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>>>> at >>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>>> at >>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>>> ------------ >>>>> >>>>> Thanks. >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >>> >> >> > >