Re: Current build taking a long time to install

2011-05-10 Thread Ted Dunning
Well, there isn't a complete sort of thing because Maven doesn't have a finite or fixed number of options. This is because nearly all of the behavior of Maven is defined by plugins that are downloaded on first invocation. You are right that this is a problem that steepens the learning curve for M

Re: Current build taking a long time to install

2011-05-10 Thread Lance Norskog
Sorry for off-topic- where is the great big page explaining maven command-line stuff? This is my biggest gripe about it compared to ant- you can find out what ant stuff does fairly quickly. Lance On Tue, May 10, 2011 at 5:03 PM, Benson Margulies wrote: > -Pfastinstall, which eliminates tests and

Re: Current build taking a long time to install

2011-05-10 Thread Benson Margulies
-Pfastinstall, which eliminates tests and also some other things. On Tue, May 10, 2011 at 2:58 PM, Grant Ingersoll wrote: > You can skip the tests if you want, that usually helps: mvn -DskipTests > install > > Perhaps there is better support in Maven 3 for parallel test execution (I've > heard

Re: Using .mvc file to train a classifier

2011-05-10 Thread Daniel McEnnis
Dear, To the best of my knowledge, Naive Bayes Classifier does not support data in the weka format. It must be fed tokenized text or wikipedia XML. https://cwiki.apache.org/confluence/display/MAHOUT/Twenty+Newsgroups https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example Hop

Re: The perennial "Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector" problem

2011-05-10 Thread Josh Patterson
I've hit this (cant find math lib) before, too. Would love to see it be less "black magic" and more "just works". =) On Mon, May 9, 2011 at 6:40 PM, Jake Mannix wrote: > wah.  Even trying to do seq2sparse doesn't work for me: > > [jake@smf1-ady-15-sr1 mahout-distribution-0.5-SNAPSHOT]$ ./bin/maho

Problems using ldatopics

2011-05-10 Thread florie
Hi, after using lda, I am having problems with reading the output topics using ldatopics: The error is as follows: [ion@lovemachine Downloads]$ mahout-0.4/bin/mahout ldatopics --input sparsePosTokens/tf-vectors --dict sparsePosTokens/dictionary.file-0 --words 30 --output sparsePosTokens/topics --

Mahout 542 on kddcup track2 data

2011-05-10 Thread Clive Cox
Hi, I'm trying to test mahout 542 (ALS Matrix Factorization) on the kddcup track2 data set and would like some feedback. I am using the latest mahout 0.5 snapshot. I converted the trainIdx2.txt data using org.apache.mahout.cf.taste.example.kddcup.ToCSV When training on this I get errors which

Re: Current build taking a long time to install

2011-05-10 Thread Grant Ingersoll
You can skip the tests if you want, that usually helps: mvn -DskipTests install Perhaps there is better support in Maven 3 for parallel test execution (I've heard there is, but haven't heard good things about it, but then again the person telling me is not a Maven fan) On May 10, 2011, at 2:43

Current build taking a long time to install

2011-05-10 Thread Steven Bourke
I'd usually build from the latest mahout once a week, I've noticed the current core is taking an what feels like an eternity to build! The distributed lanczosSolver just took 6 minutes to finish running. Is this standard or has something perhaps gone amiss

Re: Clustering boolean vectors

2011-05-10 Thread Sean Owen
(Back to user@ for the benefit of the list.) I see, so you wish to cluster movies -- by attributes or by ratings? or both? cosine similarity would only make sense in the context of ratings. I just want to make sure you don't mean you're producing recommendations. On Tue, May 10, 2011 at 5:14 PM,

Re: The perennial "Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector" problem

2011-05-10 Thread Jake Mannix
On Tue, May 10, 2011 at 8:24 AM, Sean Owen wrote: > I peeked in the examples job jar and it definitely does have this class, > along with the other dependencies (after my patch). Double-check that > you've > done the clean build an "install" again? and maybe even print out > MAHOUT_JOB > in the s

Re: Using .mvc file to train a classifier

2011-05-10 Thread Ted Dunning
Can you say a bit more about what you are trying to do at a higher level? Also, the bayes classifier is very picky about its input format. On Tue, May 10, 2011 at 3:44 AM, rm2...@columbia.edu wrote: > Hi, > I am a MAHOUT Beginner. > I used weka to generate a .arff file from the training data. >

Re: Is any more detailed documentation aout the sgd logistic regression example.

2011-05-10 Thread Ted Dunning
Great idea. Why don't you implement something like what you need? Others will be happy to contribute improvements. On Tue, May 10, 2011 at 8:26 AM, XiaoboGu wrote: > > There isn't a good command line for this, largely because it is difficult > to > > describe how to convert each CSV field. Th

Re: Clustering boolean vectors

2011-05-10 Thread Sean Owen
(Reposting my reply to the original copy of the message.) GroupLens doesn't *require* a rating per se -- you are free to ignore it if you want! Boolean data is all 1, in Mahout. There are no 0 ratings. If you just mean that the non-existent preferences are "0", OK. But having two ratings, 0 and 1,

Re: Is any more detailed documentation aout the sgd logistic regression example.

2011-05-10 Thread Ted Dunning
In the meantime, look at building your own command line tool for AdaptiveLogisticRegression. On Tue, May 10, 2011 at 8:25 AM, Ted Dunning wrote: > Go for it. > > Produce a JIRA and a patch. > > > On Tue, May 10, 2011 at 8:19 AM, XiaoboGu wrote: > >> Can you add a --algorithm option to the train

RE: Is any more detailed documentation aout the sgd logistic regression example.

2011-05-10 Thread XiaoboGu
> -Original Message- > From: Ted Dunning [mailto:ted.dunn...@gmail.com] > Sent: Thursday, May 05, 2011 11:22 PM > To: user@mahout.apache.org > Subject: Re: Is any more detailed documentation aout the sgd logistic > regression example. > > On Thu, May 5, 2011 at 7:48 AM, Xiaobo Gu wrote

Re: Is any more detailed documentation aout the sgd logistic regression example.

2011-05-10 Thread Ted Dunning
Go for it. Produce a JIRA and a patch. On Tue, May 10, 2011 at 8:19 AM, XiaoboGu wrote: > Can you add a --algorithm option to the trainlogistic and runlogistic > program, and other options need by specific algorithms, such as using L1 or > L2 prior, then TL and RL will be production ready tool

Re: The perennial "Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector" problem

2011-05-10 Thread Sean Owen
I peeked in the examples job jar and it definitely does have this class, along with the other dependencies (after my patch). Double-check that you've done the clean build an "install" again? and maybe even print out MAHOUT_JOB in the script to double-check what it is using? On Tue, May 10, 2011 at

RE: Is any more detailed documentation aout the sgd logistic regression example.

2011-05-10 Thread XiaoboGu
> -Original Message- > From: Ted Dunning [mailto:ted.dunn...@gmail.com] > Sent: Sunday, May 08, 2011 4:23 AM > To: user@mahout.apache.org > Subject: Re: Is any more detailed documentation aout the sgd logistic > regression example. > > You can't do that directly. > > You can use the ht

Re: Clustering boolean vectors

2011-05-10 Thread Steven Bourke
Are you trying to find similar items or recommend movies? If you are using the cluster approach you will just find movies with similar genres so the recommendation aspect of the work will only return recommended clusters of movies back to the user. On Tue, May 10, 2011 at 3:08 PM, mail2abin wrote

Clustering boolean vectors

2011-05-10 Thread mail2abin
Hi, I was trying to run ItemBasedRecommender on GroupLens movie sample data, which requires the rating ( user preferences inp). But suppose I do not have the rating ( user prefereces) , rather I have an Item boolean attribute vector. [ like God father - 0|1|0|0|0|0|1 ] , where the two 1's may say

Using .mvc file to train a classifier

2011-05-10 Thread rm2...@columbia.edu
Hi, I am a MAHOUT Beginner. I used weka to generate a .arff file from the training data. ./bin/mahout arff.vector --input ../X4.classifier.arff --output raghavan_test_output/ --dictOut label_bindings I found the label_bindings file to be empty. I could nt understand the reason for it and also wo

Re: Anyone Experienced in HTTP Logs as Data Source for Recommendations

2011-05-10 Thread Federico Castanedo
Hi Shem, I would like to recommend you this paper: http://research.microsoft.com/en-us/um/people/sdumais/chi08-adaretal-final.pdf not directly related with recommendations but a good study about web logs patterns. Bests, Federico 2011/5/9 Shem Cristobal : > Thanks Sean, Markus, Steven and Ted f