[jira] Created: (MAHOUT-268) Vector.getDistanceSquared() is incorrect for both SparseVector varieties

2010-01-26 Thread Jake Mannix (JIRA)
Vector.getDistanceSquared() is incorrect for both SparseVector varieties Key: MAHOUT-268 URL: https://issues.apache.org/jira/browse/MAHOUT-268 Project: Mahout Issue Typ

[jira] Created: (MAHOUT-267) Vector.norm(x) uses incorrect formula for both x == POSITIVE_INFINITY and x == 1

2010-01-26 Thread Jake Mannix (JIRA)
Vector.norm(x) uses incorrect formula for both x == POSITIVE_INFINITY and x == 1 Key: MAHOUT-267 URL: https://issues.apache.org/jira/browse/MAHOUT-267 Project: Mahout

[jira] Updated: (MAHOUT-263) Matrix interface should extend Iterable for better integration with distributed storage

2010-01-26 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated MAHOUT-263: --- Attachment: MAHOUT-263.diff Let's try a patch which is svn up'ed first. Suggest better names for t

[jira] Updated: (MAHOUT-263) Matrix interface should extend Iterable for better integration with distributed storage

2010-01-26 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated MAHOUT-263: --- Attachment: MAHOUT-263.diff Ugly ugly names. Better suggestions? > Matrix interface should extend I

[jira] Updated: (MAHOUT-263) Matrix interface should extend Iterable for better integration with distributed storage

2010-01-26 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated MAHOUT-263: --- Attachment: (was: MAHOUT-263.diff) > Matrix interface should extend Iterable for better integrati

Re: What is content based recommendation, to you

2010-01-26 Thread Ted Dunning
This is a fine way to go (and I have often (mis)used search engines as recommendation engines). Another angle is to consider the item level recommendations for a single item to simply be additional attributes. You can also look at user level cooccurrence analysis of attributes (including SVD) as

GSOC 2010 is here

2010-01-26 Thread Robin Anil
Greetings! Fellow GSOC alums, administrators and dear mentors, the next edition is right here. Details are given in the link below. https://groups.google.com/group/google-summer-of-code-discuss/browse_thread/thread/d839c0b02ac15b3f Maybe we could identify key areas in Mahout which we need to deve

Re: PFPGrowth - not able to pass hadoop any parameters

2010-01-26 Thread Jake Mannix
Yeah, this was one of my thoughts with MAHOUT-185 - turn some of our Driver classes to just fire off a Tool. It is very convenient to be able to do this, and it's becoming more standard as well. I need to dig up my stuff in decomposer/contrib-hadoop and pull that in and integrate it with Drew's p

PFPGrowth - not able to pass hadoop any parameters

2010-01-26 Thread Aurora Skarra-Gallagher
Hi, I'm using the PFPGrowth code (http://issues.apache.org/jira/browse/MAHOUT-157) from Mahout 0.3 and it works fine on my local box. However, when I try to get it to run on our grid cluster, it amazingly does not allow any parameters to be passed to Hadoop. When I look at the code (mahout/cor

Re: PFPGrowth - not able to pass hadoop any parameters

2010-01-26 Thread Robin Anil
Mahout algorithms are not using ToolRunner of Hadoop. I guess many core hadoop-ers like that feature. I think we should be supporting that feature by 0.3 Robin On Wed, Jan 27, 2010 at 5:59 AM, Sean Owen wrote: > These look like Hadoop params, to the hadoop command? why wouldn't > hadoop be par

Re: [math] More hash questions

2010-01-26 Thread Benson Margulies
If you look at any of the Colt maps, you'll see a reference to double hashing. Knuth's recommendation is to use twin primes. My meaning was that the code doesn't have any twin primes, not that there are no such beasts. I need to start at the code some more, but I was at least briefly convinced that

Re: What is content based recommendation, to you

2010-01-26 Thread Jake Mannix
On Tue, Jan 26, 2010 at 3:36 PM, Ted Dunning wrote: > I define it a bit differently by redefining recommendations as machine > learning. > > On Tue, Jan 26, 2010 at 1:44 PM, Sean Owen wrote: > > > I would narrow and specify this, in the context of Mahout, to have a > > collaborative filtering an

Re: What is content based recommendation, to you

2010-01-26 Thread Ted Dunning
On Tue, Jan 26, 2010 at 5:04 PM, Sean Owen wrote: > You're saying content-based recommendation, in practice, is often a > matter of substituting one dominant item attribute in place of items > -- recommending on artist, rather than artist track. OK, check, one > can do that in the current framewo

Re: What is content based recommendation, to you

2010-01-26 Thread Sean Owen
Nice, good wisdom here. I agree about the appeal and problems of thinking of item-attribute pairs as your items. You're saying content-based recommendation, in practice, is often a matter of substituting one dominant item attribute in place of items -- recommending on artist, rather than artist

Re: PFPGrowth - not able to pass hadoop any parameters

2010-01-26 Thread Sean Owen
These look like Hadoop params, to the hadoop command? why wouldn't hadoop be parsing those, or, why would the Job command have to shuttle them to Hadoop? I thought these were typically set in the config .xml files anyhow. On Tue, Jan 26, 2010 at 11:43 PM, Aurora Skarra-Gallagher wrote: > Hi, > >

Re: [math] More hash questions

2010-01-26 Thread Sean Owen
Code reference? On Tue, Jan 26, 2010 at 11:42 PM, Benson Margulies wrote: > The double hashes in the Colt code don't look entirely, ahem, > conventional to me. > > There aren't two different primes, let alone two different primes that > differ by 2. > > I'm probably not reading carefully enough.

Re: [math] More hash questions

2010-01-26 Thread Ted Dunning
There are definitely twin primes that differ by 2 (29 and 31 for example). See http://en.wikipedia.org/wiki/Twin_prime Surely that isn't what you are talking about and my confusion results from inability to correct a typo when reading. On Tue, Jan 26, 2010 at 3:42 PM, Benson Margulies wrote: > T

[math] More hash questions

2010-01-26 Thread Benson Margulies
The double hashes in the Colt code don't look entirely, ahem, conventional to me. There aren't two different primes, let alone two different primes that differ by 2. I'm probably not reading carefully enough.

Re: What is content based recommendation, to you

2010-01-26 Thread Ted Dunning
I define it a bit differently by redefining recommendations as machine learning. Users have preferences for objects with attributes. We would like to learn from all user/object/attribute preference data to predict so-far unobserved preferences of a user for other objects. Normal recommendations

What is content based recommendation, to you

2010-01-26 Thread Sean Owen
I want to knock down some support for content based recommendation. And I want to solicit ideas about what this even means to its intended audience -- users. I define it broadly as a recommender in which: - items have attributes (e.g. books have genres, titles, authors) rather than being completel

Re: Release thinking

2010-01-26 Thread zhao zhendong
Hi all, I will do my best to get this in 0.3 release. {quote} > MAHOUT-232 Implementation of sequential SVM solver based on Pegasos > This patch looks to be progressing - it would be really nice to get it in. {quote} Cheers, Zhendong --