@Sean, I am just testing with a small dataset. I have some large datasets
which I am planning to use on Hadoop.
Thanks.
Akshay
On Wed, Nov 9, 2011 at 12:49 PM, Sean Owen sro...@gmail.com wrote:
@Steven this is in the distributed part. There is no such method. But
Akshay if your data is not
Hi
Managed to get the Manning Chap 09 example NewsKMeansClustering working with
my own documents. However, I thought the main point of this was to cluster the
news articles together to get groups of similar content.
The example allows you to get the cluster membership in terms of
Hi.. I am unable to identify where is the clusterPoints() function in the
MeanShiftCanopyClusterer.java file being called during the execution of
Meanshift job.
What i need to know is where are the files in clusteredPoints n clusters-*
directory being written when we run the job on hadoop.
Thank you for your clarifications, now it is clear
2011/11/8 Jake Mannix jake.man...@gmail.com
The output from the LanczosSolver is not the final set of results. The
fact that you passed --cleansvd true to the system means that you want it
to do some cleanup and remove any spurious singular
Perhaps of interest:
http://atbrox.com/2011/11/09/mapreduce-hadoop-algorithms-in-academic-papers-5th-update-%E2%80%93-nov-2011/
Best regards,
Amund
To train the AdaptiveLogisticRegression, do i need to feed in new training
data only once? Or is iteration over the training data here helpful as well?
Thanks! Koert
Hi,
I am trying to run the Mahout LDA over the Reuters data set as
described in Mahout in Action however I always get only 1 topic
returned. I am running on Mahout 0.5 and here are my steps:
$ mvn -e -q exec:java
-Dexec.mainClass=org.apache.lucene.benchmark.utils.ExtractReuters
Cool, how about adding it to the Wiki?
On Nov 9, 2011, at 8:15 AM, Suneel Marthi wrote:
I can put together a doc if we don't already have one, know the SGD code
pretty well.
Regards,
Suneel
From: Grant Ingersoll grant.ingers...@gmail.com
To:
Will do.
From: Grant Ingersoll gsing...@apache.org
To: user@mahout.apache.org; Suneel Marthi suneel_mar...@yahoo.com
Sent: Wednesday, November 9, 2011 10:02 AM
Subject: Re: SGD TrainNewsGroups interim output
Cool, how about adding it to the Wiki?
On Nov 9,
This looks like a hard-coded hdfs prefix in a path name construction
somewhere.
On Wed, Nov 9, 2011 at 8:27 AM, motta motta@gmail.com wrote:
Hi everybody,
I have tried to run my first Mahout SVD Job (DistributedLanczosSolver) in
Elastic Map Reduce.
Before going to Amazon I've tried to
On Nov 9, 2011, at 3:17 AM, Rob Podolski wrote:
Hi
Managed to get the Manning Chap 09 example NewsKMeansClustering working with
my own documents. However, I thought the main point of this was to cluster
the news articles together to get groups of similar content.
The example
Hi All Dose mahout provide a user based CF implementation on Hadoop? Currently
i only see an item based hadoop implementations. Thanks. CheersRamon
There is no such implementation. Literature suggests that an item-based
approach is usually both faster and more accurate.
--sebastian
On 10.11.2011 08:34, WangRamon wrote:
Hi All Dose mahout provide a user based CF implementation on Hadoop?
Currently i only see an item based hadoop
Many thanks. Actually I delved into the source code and found out that if you
set the (undocumented) namedVector boolean to true in...
DictionaryVectorizer.createTermFrequencyVectors(
tokenizedPath,
new Path(OUTPUT_HFS_FOLDER),
conf,
Thanks Sebastian, can i assume that if there are more items than users, item
based CF will be slow.
Date: Thu, 10 Nov 2011 08:43:53 +0100
From: s...@apache.org
To: user@mahout.apache.org
Subject: Re: User based CF
There is no such implementation. Literature suggests that an item-based
I didn't hard-codec any hdfs prefix, I've just used mahout-examples-0.5-job.jar
(downloaded from mahout website) to run DistributedLanczosSolver.
The output suggest that the jar invoked FileSystem.get(conf) instead
of FileSystem.get(uri, conf) to get my input matrix
is it possible?
2011/11/10
16 matches
Mail list logo