Hi Jonathan,
Here's what u gotta do to run RowSimilarity on ur CSV formatted data. You
would have to use the MapReduce version since the Spark version only
supports LLR.
1. Convert CSV to Vectors - use CSVIterator and store the vectors as
SequenceFiles
2. Run RowIDJob on the SequenceFile output
Thanks, Charlie,
The data has been through lots of processing, but in an attempt to make it
more Mahout-friendly, I've converted it into a single csv table with
columns: star_id, wavelength, intensity. My motivation was to make it like
a user_id, item_id, rating table you might see in other Mahout
Hi Jonathan, how do you have the data stored? More info about your setup the
better.
Charlie
—
Sent from Mailbox
On Wednesday, May 13, 2015 at 23:16, Jonathan Seale ,
wrote:
Scientists,
I have an astrophysical application for Mahout that I need help with.
I have 1-dimensional
Scientists,
I have an astrophysical application for Mahout that I need help with.
I have 1-dimensional stellar spectra for many, many stars. Each spectrum
consists of a series of intensity values, one per wavelength of light. I
need to be able to find the cosine similarity between ALL pairs of st
Spark's word2vec is pretty agile.
On Wed, May 13, 2015 at 12:13 PM, David Starina
wrote:
> You can also check out the implementation in MLlib:
> https://spark.apache.org/docs/latest/mllib-feature-extraction.html#word2vec
>
>
>
> On Wed, May 13, 2015 at 9:11 PM, Dan Dong wrote:
>
> > Thanks Andr
You can also check out the implementation in MLlib:
https://spark.apache.org/docs/latest/mllib-feature-extraction.html#word2vec
On Wed, May 13, 2015 at 9:11 PM, Dan Dong wrote:
> Thanks Andrew, I will turn to DL4J.
>
> Cheers,
> Dan
>
>
> 2015-05-13 10:34 GMT-05:00 Andrew Musselman :
>
> > Mah
Thanks Andrew, I will turn to DL4J.
Cheers,
Dan
2015-05-13 10:34 GMT-05:00 Andrew Musselman :
> Mahout doesn't have a word2vec impl that I know of, but DL4J does:
> http://deeplearning4j.org/word2vec.html
>
> On Wednesday, May 13, 2015, Dan Dong wrote:
>
> > Hi,
> > Does anyone know how to r
There is a bug in mahout 0.10.0 that you can fix if you are able to build from
source. Get the source tar for 0.10.0, not the current master.
Got to
https://github.com/apache/mahout/blob/mahout-0.10.x/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala#L157
remove the
Mahout doesn't have a word2vec impl that I know of, but DL4J does:
http://deeplearning4j.org/word2vec.html
On Wednesday, May 13, 2015, Dan Dong wrote:
> Hi,
> Does anyone know how to run word2vec in mahout? I could not find docs
> about it on Mahout web site. Thanks.
>
> Cheers,
> Dan
>
Hi,
Does anyone know how to run word2vec in mahout? I could not find docs
about it on Mahout web site. Thanks.
Cheers,
Dan
Hello,
I've tried spark-rowsimilarity with out-of-the-box setup (downloaded mahout
distribution and spark, and set up the PATH), and I stumble upon a Java
Heap space error. My input file is ~100MB. It seems the various parameters
I tried to give won't change this. I do :
~/mahout-distribution-0.1
11 matches
Mail list logo