Thanks, Charlie, The data has been through lots of processing, but in an attempt to make it more Mahout-friendly, I've converted it into a single csv table with columns: star_id, wavelength, intensity. My motivation was to make it like a user_id, item_id, rating table you might see in other Mahout uses.
As opposed to using my local machine, I've setup an instance on Amazon with hopes of turning this into a remote service. So the install is whatever comes with Amazon's default Mahout installation. Jonathan On Wed, May 13, 2015 at 11:29 PM, Charlie Hack <charles.t.h...@gmail.com> wrote: > Hi Jonathan, how do you have the data stored? More info about your setup > the better. > > > Charlie > > > > > > > > > > — > Sent from Mailbox > > > > > On Wednesday, May 13, 2015 at 23:16, Jonathan Seale < > jonathanpse...@gmail.com>, wrote: > Scientists, > > > I have an astrophysical application for Mahout that I need help with. > > > I have 1-dimensional stellar spectra for many, many stars. Each spectrum > > consists of a series of intensity values, one per wavelength of light. I > > need to be able to find the cosine similarity between ALL pairs of stars. > > Seems to me this is simply a user-user similarity problem where I have > > stars instead of users, wavelengths instead of items, and intensities > > instead of ratings/clicks. > > > But I'm having difficulty using mahout's row similarity package (I'm new to > > this, and these days astronomers code pretty exclusively in python). I know > > that I must have to 1) create a sparse matrix where each row is a star, > > columns are wavelengths, and the values are intensity, and 2) implement row > > similarity. But I'm just not sure how to do it. Anyone have a good resource > > or be willing to help? I could probably offer some compensation to anyone > > that would be willing to provide a little focussed, personalized > assistance. > > > Thanks, > > Jonathan >