Thanks for finding this. It appears to be because the jar passed to Spark with classes to be serialized was not updated when some code was refactored. We have a fix under test that will be in the next RC. If you could test the next RC (maybe ready tomorrow) we’d be very grateful.
On Mar 3, 2017, at 12:58 PM, Michael Müller <michael.muel...@condat.de> wrote: > So you are downloading the binary and running the Mahout spark-itemsimilarity > driver from that binary? yes > You say “using the same Spark cluster” How is this setup, an env var like > MASTER=? > Can you supply you you point to the cluster and your CLI for the job? These are my environment settings for Spark and Mahout: export MAHOUT_HOME=/home/aml/mahout/apache-mahout-distribution-0.13.0 #export MAHOUT_LOCAL=true export SPARK_HOME=/home/aml/spark/spark-1.6.3-bin-hadoop2.6 export MASTER=spark://ubuntu:7077 export JAVA_HOME=/usr/lib/jvm/java-8-oracle/jre I'm starting the job like this: /home/aml/mahout/apache-mahout-distribution-0.13.0/bin/mahout spark-itemsimilarity --master spark://ubuntu:7077 --input ~/data/rating_200k.csv --output ~/data/rating_200k_output --itemIDColumn 1 --rowIDColumn 0 --sparkExecutorMem 6g And when i change MAHOUT_HOME to point to my Mahout 0.12.2 installation (-> /home/aml/mahout/apache-mahout-distribution-0.12.2) and then start the job like that, it succeeds: /home/aml/mahout/apache-mahout-distribution-0.12.2/bin/mahout spark-itemsimilarity --master spark://ubuntu:7077 --input ~/data/rating_200k.csv --output ~/data/rating_200k_output --itemIDColumn 1 --rowIDColumn 0 --sparkExecutorMem 6g -----Ursprüngliche Nachricht----- Von: Pat Ferrel [mailto:p...@occamsmachete.com] Gesendet: Freitag, 3. März 2017 20:49 An: Michael Müller Cc: user@mahout.apache.org Betreff: Re: 0.13.0-RC not fully compatible with Spark 1.6.3? Thanks, I’ll see if I can reproduce. So you are downloading the binary and running the Mahout spark-itemsimilarity driver from that binary? You say “using the same Spark cluster” How is this setup, an env var like MASTER=? Can you supply you you point to the cluster and your CLI for the job? On Mar 3, 2017, at 1:26 AM, Michael Müller <michael.muel...@condat.de> wrote: Hi all, is Mahout 0.13.0 supposed to work with Spark 1.6.3? I would think so as the master-pom.xml explicitly references Spark 1.6.3. But when I run a spark-itemsimilarity command (on the 0.13.0-RC) against my Spark 1.6.3-standalone cluster, the command fails with: 17/03/03 10:08:40 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, reco-master): java.io.IOException: org.apache.spark.SparkException: Failed to register classes with Kryo at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1212) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165) ... Caused by: java.lang.ClassNotFoundException: org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$5.apply(KryoSerializer.scala:123) at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$5.apply(KryoSerializer.scala:123) at scala.Option.map(Option.scala:145) at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:123) When I run the exactly same command on the 0.12.2 release distribution against the same Spark cluster, the command completes sucessfully. My Environment is: * Ubuntu 14.04 * Oracle-JDK 1.8.0_121 * Spark standalone cluster using this distribution: http://d3kbcqa49mib13.cloudfront.net/spark-1.6.3-bin-hadoop2.6.tgz * Mahout 0.13.0-RC: https://repository.apache.org/content/repositories/orgapachemahout-1034/org/apache/mahout/apache-mahout-distribution/0.13.0/apache-mahout-distribution-0.13.0.tar.gz TIA -- Michael Müller Condat AG, Berlin