[jira] [Commented] (MAHOUT-1489) Interactive Scala & Spark Bindings Shell & Script processor

2014-03-28 Thread Saikat Kanjilal (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951681#comment-13951681 ] Saikat Kanjilal commented on MAHOUT-1489: - Dmitry, I've gone ahead and forked mah

Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

2014-03-28 Thread Suneel Marthi
.. and please create a JIRA for this, it definitely seems like an issue. Nevertheless its time to verify and validate this impl given that the original author has not responded. On , Suneel Marthi wrote: I was alluring to TrainNaiveBayesJob which is MR only.  U r right TestNaiveBayesDriver

Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

2014-03-28 Thread Sebastian Schelter
Great. The details how to submit a patch are here: https://mahout.apache.org/developers/how-to-contribute.html --sebastian On 03/28/2014 09:29 PM, Chandler Burgess wrote: Forgot to include in the last mail. Again, I do have the Rennie paper which I'll dig in to and see if I can fix it sometim

Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

2014-03-28 Thread Suneel Marthi
I was alluring to TrainNaiveBayesJob which is MR only.  U r right TestNaiveBayesDriver has both MR and sequential. Looking at the code for MR v/s sequential in TestNaiveBayes they both seem to be calling the respective Standard/Complimentary Naive Bayes classifiers. I guess we need to look at C

[jira] [Updated] (MAHOUT-1497) mahout resplit not producing splited files

2014-03-28 Thread Reinis Vicups (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reinis Vicups updated MAHOUT-1497: -- Description: when I run "mahout resplit", I get the output below but no split files are being

[jira] [Created] (MAHOUT-1497) mahout resplit not producing splited files

2014-03-28 Thread Reinis Vicups (JIRA)
Reinis Vicups created MAHOUT-1497: - Summary: mahout resplit not producing splited files Key: MAHOUT-1497 URL: https://issues.apache.org/jira/browse/MAHOUT-1497 Project: Mahout Issue Type: Bug

RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

2014-03-28 Thread Chandler Burgess
Ok, then I should remove it? There's about 2 dozen lines of code in TestNaiveBayesDriver for running sequentially. -Original Message- From: Suneel Marthi [mailto:suneel_mar...@yahoo.com] Sent: Friday, March 28, 2014 3:51 PM To: dev@mahout.apache.org Subject: Re: MAHOUT-1369 - Why does th

Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

2014-03-28 Thread Suneel Marthi
Bayes doesn't have a non-mapreduce impl so -seq flag wouldn't work. Sent from my iPhone > On Mar 28, 2014, at 4:16 PM, Chandler Burgess > wrote: > > Well, maybe someone can correct me but this seems disappointing. I > uncommented the code in NaiveBayesModel, BayesUtil and TrainNaiveBayesJob

RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

2014-03-28 Thread Chandler Burgess
Forgot to include in the last mail. Again, I do have the Rennie paper which I'll dig in to and see if I can fix it sometime in the near future. I'll also look at the problem with -seq flag to testnb. All the guidelines for submitting patches are on JIRA or the mahout.apache.org pages, correct?

[jira] [Commented] (MAHOUT-1493) Port Naive Bayes to the Spark DSL

2014-03-28 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951016#comment-13951016 ] Dmitriy Lyubimov commented on MAHOUT-1493: -- I d say we really need that shell fi

RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

2014-03-28 Thread Chandler Burgess
Well, maybe someone can correct me but this seems disappointing. I uncommented the code in NaiveBayesModel, BayesUtil and TrainNaiveBayesJob, added some trace statements in ComplementaryThetaMapper and ComplementaryNaiveBayesClassifier to verify they were being called, and then ran some tests us

[jira] [Updated] (MAHOUT-1374) Ability to provide input file with userid, itemid pair

2014-03-28 Thread Aliaksei Litouka (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aliaksei Litouka updated MAHOUT-1374: - Attachment: MAHOUT-1374.patch > Ability to provide input file with userid, itemid pair >

[jira] [Updated] (MAHOUT-1374) Ability to provide input file with userid, itemid pair

2014-03-28 Thread Aliaksei Litouka (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aliaksei Litouka updated MAHOUT-1374: - Status: Patch Available (was: Open) userItemFile option was added. It allows to specify

Re: how to implement parallel sgd in map reduce?

2014-03-28 Thread Ted Dunning
Yes. That is feasible. I think that you would have better luck with something like asynchronous SGD as described here: http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2012_0598.pdf and here http://www.cs.toronto.edu/~fritz/absps/georgerectified.pdf It would also be good to cons

[jira] [Commented] (MAHOUT-1476) Cleanup website on Hidden Markov Models

2014-03-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951113#comment-13951113 ] Hudson commented on MAHOUT-1476: SUCCESS: Integrated in Mahout-Quality #2544 (See [https

[jira] [Created] (MAHOUT-1495) Create a website describing the distributed item-based recommender

2014-03-28 Thread Sebastian Schelter (JIRA)
Sebastian Schelter created MAHOUT-1495: -- Summary: Create a website describing the distributed item-based recommender Key: MAHOUT-1495 URL: https://issues.apache.org/jira/browse/MAHOUT-1495 Projec

[jira] [Created] (MAHOUT-1496) Create a website describing the distributed ALS recommender

2014-03-28 Thread Sebastian Schelter (JIRA)
Sebastian Schelter created MAHOUT-1496: -- Summary: Create a website describing the distributed ALS recommender Key: MAHOUT-1496 URL: https://issues.apache.org/jira/browse/MAHOUT-1496 Project: Maho

[jira] [Created] (MAHOUT-1494) README.txt is examples/clustering needs to be updated

2014-03-28 Thread Suneel Marthi (JIRA)
Suneel Marthi created MAHOUT-1494: - Summary: README.txt is examples/clustering needs to be updated Key: MAHOUT-1494 URL: https://issues.apache.org/jira/browse/MAHOUT-1494 Project: Mahout Issu

[jira] [Resolved] (MAHOUT-1476) Cleanup website on Hidden Markov Models

2014-03-28 Thread Andrew Musselman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Musselman resolved MAHOUT-1476. -- Resolution: Fixed Resolving after no comments. > Cleanup website on Hidden Markov Mod

how to implement parallel sgd in map reduce?

2014-03-28 Thread Li Li
I have read "Parallelized stochastic gradient descent" (2010) by Martin A. Zinkevich et al. the parallel sgd is very simple: Define T = ⌊m/k⌋ Randomly partition the examples, giving T examples to each machine. for all i ∈ {1, . . . k} parallel do Randomly shuffle the data on machine i. Ini

Re: [jira] [Commented] (MAHOUT-1482) Rework quickstart website

2014-03-28 Thread Sebastian Schelter
Try an online markdown editor to check the formatting. Best, Sebastian Am 28.03.2014 10:59 schrieb "jian wang (JIRA)" : > > [ > https://issues.apache.org/jira/browse/MAHOUT-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950577#comment-13950577] >

[jira] [Commented] (MAHOUT-1482) Rework quickstart website

2014-03-28 Thread jian wang (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950577#comment-13950577 ] jian wang commented on MAHOUT-1482: --- hi, would like to know how to verify the html docu

[jira] [Updated] (MAHOUT-1493) Port Naive Bayes to the Spark DSL

2014-03-28 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1493: --- Attachment: MAHOUT-1493.patch Updated the patch according to Dmitriy's style suggest