Thanks, Charlie,

The data has been through lots of processing, but in an attempt to make it
more Mahout-friendly, I've converted it into a single csv table with
columns: star_id, wavelength, intensity. My motivation was to make it like
a user_id, item_id, rating table you might see in other Mahout uses.

As opposed to using my local machine, I've setup an instance on Amazon with
hopes of turning this into a remote service. So the install is whatever
comes with Amazon's default Mahout installation.

Jonathan



On Wed, May 13, 2015 at 11:29 PM, Charlie Hack <charles.t.h...@gmail.com>
wrote:

> Hi Jonathan, how do you have the data stored? More info about your setup
> the better.
>
>
> Charlie
>
>
>
>
>
>
>
>
>
> —
> Sent from Mailbox
>
>
>
>
> On Wednesday, May 13, 2015 at 23:16, Jonathan Seale <
> jonathanpse...@gmail.com>, wrote:
> Scientists,
>
>
> I have an astrophysical application for Mahout that I need help with.
>
>
> I have 1-dimensional stellar spectra for many, many stars. Each spectrum
>
> consists of a series of intensity values, one per wavelength of light. I
>
> need to be able to find the cosine similarity between ALL pairs of stars.
>
> Seems to me this is simply a user-user similarity problem where I have
>
> stars instead of users, wavelengths instead of items, and intensities
>
> instead of ratings/clicks.
>
>
> But I'm having difficulty using mahout's row similarity package (I'm new to
>
> this, and these days astronomers code pretty exclusively in python). I know
>
> that I must have to 1) create a sparse matrix where each row is a star,
>
> columns are wavelengths, and the values are intensity, and 2) implement row
>
> similarity. But I'm just not sure how to do it. Anyone have a good resource
>
> or be willing to help? I could probably offer some compensation to anyone
>
> that would be willing to provide a little focussed, personalized
> assistance.
>
>
> Thanks,
>
> Jonathan
>

Reply via email to