Re: Row Similarity

2015-05-13 Thread Suneel Marthi
Hi Jonathan, Here's what u gotta do to run RowSimilarity on ur CSV formatted data. You would have to use the MapReduce version since the Spark version only supports LLR. 1. Convert CSV to Vectors - use CSVIterator and store the vectors as SequenceFiles 2. Run RowIDJob on the SequenceFile output

Re: Row Similarity

2015-05-13 Thread Jonathan Seale
Thanks, Charlie, The data has been through lots of processing, but in an attempt to make it more Mahout-friendly, I've converted it into a single csv table with columns: star_id, wavelength, intensity. My motivation was to make it like a user_id, item_id, rating table you might see in other Mahout

Re: Row Similarity

2015-05-13 Thread Charlie Hack
Hi Jonathan, how do you have the data stored? More info about your setup the better.  Charlie  — Sent from Mailbox On Wednesday, May 13, 2015 at 23:16, Jonathan Seale , wrote: Scientists, I have an astrophysical application for Mahout that I need help with. I have 1-dimensional

Row Similarity

2015-05-13 Thread Jonathan Seale
Scientists, I have an astrophysical application for Mahout that I need help with. I have 1-dimensional stellar spectra for many, many stars. Each spectrum consists of a series of intensity values, one per wavelength of light. I need to be able to find the cosine similarity between ALL pairs of st

Re: word2vec in mahout.

2015-05-13 Thread Dmitriy Lyubimov
Spark's word2vec is pretty agile. On Wed, May 13, 2015 at 12:13 PM, David Starina wrote: > You can also check out the implementation in MLlib: > https://spark.apache.org/docs/latest/mllib-feature-extraction.html#word2vec > > > > On Wed, May 13, 2015 at 9:11 PM, Dan Dong wrote: > > > Thanks Andr

Re: word2vec in mahout.

2015-05-13 Thread David Starina
You can also check out the implementation in MLlib: https://spark.apache.org/docs/latest/mllib-feature-extraction.html#word2vec On Wed, May 13, 2015 at 9:11 PM, Dan Dong wrote: > Thanks Andrew, I will turn to DL4J. > > Cheers, > Dan > > > 2015-05-13 10:34 GMT-05:00 Andrew Musselman : > > > Mah

Re: word2vec in mahout.

2015-05-13 Thread Dan Dong
Thanks Andrew, I will turn to DL4J. Cheers, Dan 2015-05-13 10:34 GMT-05:00 Andrew Musselman : > Mahout doesn't have a word2vec impl that I know of, but DL4J does: > http://deeplearning4j.org/word2vec.html > > On Wednesday, May 13, 2015, Dan Dong wrote: > > > Hi, > > Does anyone know how to r

Re: spark-rowsimilarity java.lang.OutOfMemoryError: Java heap space

2015-05-13 Thread Pat Ferrel
There is a bug in mahout 0.10.0 that you can fix if you are able to build from source. Get the source tar for 0.10.0, not the current master. Got to https://github.com/apache/mahout/blob/mahout-0.10.x/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala#L157 remove the

Re: word2vec in mahout.

2015-05-13 Thread Andrew Musselman
Mahout doesn't have a word2vec impl that I know of, but DL4J does: http://deeplearning4j.org/word2vec.html On Wednesday, May 13, 2015, Dan Dong wrote: > Hi, > Does anyone know how to run word2vec in mahout? I could not find docs > about it on Mahout web site. Thanks. > > Cheers, > Dan >

word2vec in mahout.

2015-05-13 Thread Dan Dong
Hi, Does anyone know how to run word2vec in mahout? I could not find docs about it on Mahout web site. Thanks. Cheers, Dan

spark-rowsimilarity java.lang.OutOfMemoryError: Java heap space

2015-05-13 Thread Xavier Rampino
Hello, I've tried spark-rowsimilarity with out-of-the-box setup (downloaded mahout distribution and spark, and set up the PATH), and I stumble upon a Java Heap space error. My input file is ~100MB. It seems the various parameters I tried to give won't change this. I do : ~/mahout-distribution-0.1