Re: Recommend output: User vs. Item, Tanimoto vs. LogLikelihood

2011-04-22 Thread Lance Norskog
The "abstract information structure" encoded in the item-item graph is completely different from the user-user graph. Also, there are different User-based and Item-based approaches. Comparing recommendations is hard. It is not really possible to make an absolute or even fuzzy ranking of "what shoul

Recommend output: User vs. Item, Tanimoto vs. LogLikelihood

2011-04-22 Thread Otis Gospodnetic
Hi, Given the same input data, should the same list of recommended items be returned regardless of whether one uses Item-based or User-based recommendations? I always thought the answer was yes (same "matrix" just flipped differently is how I imagined it), but I recently saw output of some M

Re: kmeans on space-delimited input data,

2011-04-22 Thread Vincent Xue
Hello vs, I am also a beginner mahout user and I think that the problem may be with your initial step to convert the txt matrix to a sequence file. I had a similar task to convert a tab delimited matrix into a sequence file of for SVD computations. What I did, was to write some custom Java code

kmeans on space-delimited input data,

2011-04-22 Thread vs
Mahout Users, I have seen posts attempting to an answer the problem i have in hand. But, i would like to seek some comments from who have been successful in resolving this issue. (1) Input data: A space-delimited symmetric matrix of 500x500 double values. The entire matrix is in one-single fil

Re: Does the Feature Hashing and Collision in the SGD will harm the performance of the algorithm?

2011-04-22 Thread Ted Dunning
Yes. But how do we specify the input? And how do we specify the encodings? This is what has always held me back in the past. Should we just allow classes to be specified on the command line? On Fri, Apr 22, 2011 at 8:47 AM, Dmitriy Lyubimov wrote: > Maybe there's a space for Mr based input c

Re: Does the Feature Hashing and Collision in the SGD will harm the performance of the algorithm?

2011-04-22 Thread Ted Dunning
On Fri, Apr 22, 2011 at 6:39 AM, Stanley Xu wrote: > One more question, I am also trying to test the MixedGradient, it looks > like the RankingGradient will take much more time than the DefaultGradient. > This is probably due to memory use. You need to review which way you group users. > > If

Re: Anyway to speedup the category feature parsing and encoding in the SGD algorithm?

2011-04-22 Thread Ted Dunning
Look at VectorWritable On Fri, Apr 22, 2011 at 6:57 AM, Stanley Xu wrote: > Hi Ted, > > Which class do you mean for the sparse vector as Writable? > > I checked the code that neither the RandomAccessSparseVector nor > SequentialAccessSparseVector implemented the Writable interface. > > Thanks. >

Re: Does the Feature Hashing and Collision in the SGD will harm the performance of the algorithm?

2011-04-22 Thread Dmitriy Lyubimov
Maybe there's a space for Mr based input conversion job indeed as a command line routine? I was kind of thinking about the same. Maybe even along with standartisation of the values. Some formal definition of inputs being fed to it. apologies for brevity. Sent from my android. -Dmitriy On Apr 21,

Re: Anyway to speedup the category feature parsing and encoding in the SGD algorithm?

2011-04-22 Thread Stanley Xu
Hi Ted, Which class do you mean for the sparse vector as Writable? I checked the code that neither the RandomAccessSparseVector nor SequentialAccessSparseVector implemented the Writable interface. Thanks. On Fri, Apr 22, 2011 at 12:49 PM, Ted Dunning wrote: > The binary format is already defin

Re: Does the Feature Hashing and Collision in the SGD will harm the performance of the algorithm?

2011-04-22 Thread Stanley Xu
Got it. Thanks so much, Ted. One more question, I am also trying to test the MixedGradient, it looks like the RankingGradient will take much more time than the DefaultGradient. If I set the alpha to 0.5, it will take 50 times of time comparing to the DefaultGradient, I thought it should be like t