Re: Interpretation of co-efficients of features in Mahout logistic output

2012-05-14 Thread Nowal, Akshay
Thanks Ted for the early reply. But what I m not understanding is what is the significance of these features? What do they relate to? I have gone through logistic regression in Mahout in Action book ,I know it is a very silly question but I m not clear what features are here and how to

Re: Re: 40 hours to run 1/2 Netflix Data?

2012-05-14 Thread 许春玲
Ted, Yes, Memory per node is only 16G.Usage of Memory cached is 100% as attached file show. And CPU is 100% too. And Max size of local disk hadoop temp is 160G, and it will be used 100% . It like that key point is the Sixth step of recommonder, for every time job fail at this step. I have

Persistent Data Model

2012-05-14 Thread Nikolaos Romanos Katsipoulakis
Hi to everybody, I am currently developing a recommendation system, that relies on an Apache Tomcat Server (6.0.26) and is triggered by Web-Service Calls (JAX-WS). In order to improve the performance of the system, I have been asked to asynchronously call the recommender and save the

Re: Persistent Data Model

2012-05-14 Thread Sean Owen
Can you persist a DataModel? sure. The easiest thing is to read/write a CSV file. Or put the data in a database. The existing implementations already read such things into memory for you. I am not sure what you mean about computing the DataModel as a separate process. The DataModel exists and

online clustering with mahout

2012-05-14 Thread Ioan Eugen Stan
Hi, Dos mahout offer online clustering out of the box using sequential clustering (no MapReduce). I'm looking over the code (trunk) and I found ClusterClassifier but I can't figure out how that works. Any examples or more docs on this topic? Thanks, -- Ioan Eugen Stan

Re: Persistent Data Model

2012-05-14 Thread Nikolaos Romanos Katsipoulakis
On 05/14/2012 02:54 PM, Sean Owen wrote: Can you persist a DataModel? sure. The easiest thing is to read/write a CSV file. Or put the data in a database. The existing implementations already read such things into memory for you. I am not sure what you mean about computing the DataModel as a

Re: online clustering with mahout

2012-05-14 Thread Jeff Eastman
Look at ClusterIterator.iterate(). This will do clustering in memory without any Hadoop. ClusterIterator.iterateSeq will do clustering in a single process from/to Hadoop sequence files but without map/reduce. ClusterIterator.iterateMR uses full Hadoop to do clustering for the same algorithms

Re: Exception running 20newsgroups example

2012-05-14 Thread Ted Dunning
What you are missing is a Linux compatible environment. Running programs under Cygwin can be pretty difficult because of the path name insanity that often ensues. Sent from my iPhone On May 13, 2012, at 6:33 PM, mahout-newbie raman.sriniva...@gmail.com wrote: When I try to run the 20

Re: Question about storage in Pig-vector (Pig + Mahout)

2012-05-14 Thread Timothy Potter
Hi Ted, Re: In the readme, there is an example of using elephant-bird to store the Classifier in a SequenceFile, i.e. /* the trained model is passed to use as a bytearray so we just pass it on out. The classifier class just contains the list of target valeus and the

Re: RowSimilarity

2012-05-14 Thread Pat Ferrel
Thanks, this is quite clear and reasonable. The cutoff is made based on lack of term cooccurrences not the distance measure. The optional 'threshold' is based on the distance measure. BTW I assume the 'distance' returned is expressed in the distance measure's units? So using cosine as a

Re: Question about storage in Pig-vector (Pig + Mahout)

2012-05-14 Thread Ted Dunning
Tim, Sorry for the confusion and lack of help. Pig-vector is half-done and not even quite half-baked. Your help in updating the readme is very much appreciated. On Mon, May 14, 2012 at 10:17 AM, Timothy Potter thelabd...@gmail.comwrote: Hi Ted, Re: In the readme, there is an example of

Re: Question about storage in Pig-vector (Pig + Mahout)

2012-05-14 Thread Timothy Potter
My pleasure and hoping to do more with it ;-) Cheers, Tim On Mon, May 14, 2012 at 1:11 PM, Ted Dunning ted.dunn...@gmail.com wrote: Tim, Sorry for the confusion and lack of help. Pig-vector is half-done and not even quite half-baked. Your help in updating the readme is very much

Re: Exception running 20newsgroups example

2012-05-14 Thread Lance Norskog
In this case it is looking for a c:\tmp. Do you have one? It does not come standard with Windows, you have to make it. This particular code path works, since bin/mahout does not run any cygwin programs, only Java. I used it a lot. On Mon, May 14, 2012 at 9:19 AM, Ted Dunning

Re: RowSimilarity

2012-05-14 Thread Pat Ferrel
Sorry but I'm still confused. So the similarity magnitude has nothing to do with one of mahout's distance measures, the similarity class is used only to specify the algorithm used to calculate this magnitude and does not imply a connection between distance and similarity? I'm now a bit unsure

large scale kmeans

2012-05-14 Thread Jiaan Zeng
Hi ALL, Has anyone tried Dunning's large scale k-means (https://github.com/tdunning/knn)? It looks pretty interesting. It looks like it does not have a working map reduce version yet although the doc states the implementation is straight forward. If anyone tried that implementation, could you

Re: large scale kmeans

2012-05-14 Thread Ted Dunning
I have tried it. And an unnamed large customer of ours has tried it with good results. That isnt much of a track record yet but it is encouraging. All of this use so far is as part of k-nearest neighbor work so there isn't a comparison for pure clustering. Also, this work is all at 10-50