OK, fixed the snappy issue (that happens on Mac/jre 1.7) by downloading https://wso2.org/jira/secure/attachment/32013/libsnappyjava.jnilib and placing the file in /usr/lib/java/ Now, when I run ./bin/mahout spark-itemsimilarity -i demoItems.csv -o output4 -fc 1 -ic 2 --filter1 purchase --filter2 viewI get the desired output just like the example
On Wednesday, March 18, 2015 4:59 PM, Pat Ferrel <p...@occamsmachete.com> wrote: Looks like you don’t have the native snappy code installed correctly. That’s a Hadoop thing I think, for fast compressed serialization methinks. On Mar 18, 2015, at 4:08 PM, Jeff Isenhart <jeffi...@yahoo.com.INVALID> wrote: Thanks for the input Pat. I ran the following command ./bin/mahout spark-itemsimilarity -i demoItems.csv -o output4 -fc 1 -ic 2 --filter1 purchase --filter2 view on data u1,purchase,iphoneu1,purchase,ipadu2,purchase,nexus and now seeing this error java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317) at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219) at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44) at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79) at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125) at org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1029) at org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608) at org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608) at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:126) at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192) at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67) at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at org.apache.spark.util.collection.AppendOnlyMap$$anon$1.foreach(AppendOnlyMap.scala:159) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886) at java.lang.Runtime.loadLibrary0(Runtime.java:849) at java.lang.System.loadLibrary(System.java:1088) at org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52) ... 26 more On Thursday, March 12, 2015 10:35 AM, Pat Ferrel <p...@occamsmachete.com> wrote: There are many ways to structure the input. The spark-itemsimilarity driver can take only two actions, though the internal code, if you want to use it as a library, will take any number. The CLI driver can optionally take input of the for you mention but will extract a primary and single secondary action per execution. If you have more than two actions you can run the driver once for every secondary action or use the lib interface. You can have your interactions in separate dirs of the form I mentioned in the original answer, in which case you pass in -i and -i2 params. If you want to mix actions in the same files, use the format you describe: u1,item1,action1 u1,item10,action2 u1,item500,action3 u2,item2,action1 u2,item500,action3 ... The columns can be moved around and specified on the CLI. To use the above with the CLI you would have to process action1 and action2 with one execution, and action1 and action3 with another execution. This will create 4 outputs the two “similarity-matrix” dirs will be identical. This would give you indicators for action1 (actually two identical indicators) action2 and action3 On Mar 12, 2015, at 9:52 AM, Jeff Isenhart <jeffi...@yahoo.com.INVALID> wrote: <pre>Hmmm, then what about the "How to Use Multiple Actions" section that states For a mixed action log of the form:u1,purchase,iphone u1,purchase,ipad u2,purchase,nexus</pre> On Thursday, March 12, 2015 9:39 AM, Pat Ferrel <p...@occamsmachete.com> wrote: spark-itemsimilarity takes tuples user-id,item-id You are looking at the collected input as a matrix. it would be collected from something of the form: u1,item1 u1,item10 u1,item500 u2,item2 u2,item500 ... On Mar 11, 2015, at 8:24 PM, Jeff Isenhart <jeffi...@yahoo.com.INVALID> wrote: I am trying to run the example found here: http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html The data (demoItems.csv added to hdfs) is just copied from the example: u1,purchase,iphoneu1,purchase,ipadu2,purchase,nexus...... But when I run mahout spark-itemsimilarity -i demoItems.csv -o output2 -fc 1 -ic 2 I get empty _SUCCESS and part-00000 files output2/indicator-matrix Any ideas?