Thanks Ted for the early reply.
But what I m not understanding is what is the significance of these
features?
What do they relate to?
I have gone through logistic regression in Mahout in Action book ,I
know it is a very silly question but I m not clear what features are
here and how to
Ted,
Yes, Memory per node is only 16G.Usage of Memory cached is 100% as attached
file show. And CPU is 100% too.
And Max size of local disk hadoop temp is 160G, and it will be used 100% .
It like that key point is the Sixth step of recommonder, for every time job
fail at this step.
I have
Hi to everybody,
I am currently developing a recommendation system, that relies on an
Apache Tomcat Server (6.0.26) and is
triggered by Web-Service Calls (JAX-WS).
In order to improve the performance of the system, I have been asked to
asynchronously call the recommender and save the
Can you persist a DataModel? sure. The easiest thing is to read/write
a CSV file. Or put the data in a database. The existing
implementations already read such things into memory for you. I am not
sure what you mean about computing the DataModel as a separate
process. The DataModel exists and
Hi,
Dos mahout offer online clustering out of the box using sequential
clustering (no MapReduce). I'm looking over the code (trunk) and I
found ClusterClassifier but I can't figure out how that works. Any
examples or more docs on this topic?
Thanks,
--
Ioan Eugen Stan
On 05/14/2012 02:54 PM, Sean Owen wrote:
Can you persist a DataModel? sure. The easiest thing is to read/write
a CSV file. Or put the data in a database. The existing
implementations already read such things into memory for you. I am not
sure what you mean about computing the DataModel as a
Look at ClusterIterator.iterate(). This will do clustering in memory
without any Hadoop. ClusterIterator.iterateSeq will do clustering in a
single process from/to Hadoop sequence files but without map/reduce.
ClusterIterator.iterateMR uses full Hadoop to do clustering for the same
algorithms
What you are missing is a Linux compatible environment. Running programs under
Cygwin can be pretty difficult because of the path name insanity that often
ensues.
Sent from my iPhone
On May 13, 2012, at 6:33 PM, mahout-newbie raman.sriniva...@gmail.com wrote:
When I try to run the 20
Hi Ted,
Re:
In the readme, there is an example of using elephant-bird to store the
Classifier in a SequenceFile, i.e.
/* the trained model is passed to use as a bytearray so we just pass it
on out. The classifier
class just contains the list of target valeus and the
Thanks, this is quite clear and reasonable. The cutoff is made based on
lack of term cooccurrences not the distance measure. The optional
'threshold' is based on the distance measure.
BTW I assume the 'distance' returned is expressed in the distance
measure's units? So using cosine as a
Tim,
Sorry for the confusion and lack of help. Pig-vector is half-done and not
even quite half-baked.
Your help in updating the readme is very much appreciated.
On Mon, May 14, 2012 at 10:17 AM, Timothy Potter thelabd...@gmail.comwrote:
Hi Ted,
Re:
In the readme, there is an example of
My pleasure and hoping to do more with it ;-)
Cheers,
Tim
On Mon, May 14, 2012 at 1:11 PM, Ted Dunning ted.dunn...@gmail.com wrote:
Tim,
Sorry for the confusion and lack of help. Pig-vector is half-done and not
even quite half-baked.
Your help in updating the readme is very much
In this case it is looking for a c:\tmp. Do you have one? It does not
come standard with Windows, you have to make it.
This particular code path works, since bin/mahout does not run any
cygwin programs, only Java. I used it a lot.
On Mon, May 14, 2012 at 9:19 AM, Ted Dunning
Sorry but I'm still confused. So the similarity magnitude has nothing to
do with one of mahout's distance measures, the similarity class is used
only to specify the algorithm used to calculate this magnitude and does
not imply a connection between distance and similarity? I'm now a bit
unsure
Hi ALL,
Has anyone tried Dunning's large scale k-means
(https://github.com/tdunning/knn)? It looks pretty interesting.
It looks like it does not have a working map reduce version yet
although the doc states the implementation is straight forward. If
anyone tried that implementation, could you
I have tried it. And an unnamed large customer of ours has tried it with good
results. That isnt much of a track record yet but it is encouraging.
All of this use so far is as part of k-nearest neighbor work so there isn't a
comparison for pure clustering. Also, this work is all at 10-50
16 matches
Mail list logo