categorization on crawl data

2014-01-13 Thread Vikas Parashar
Hi folks, Have anyone tried to do categorization on crawl data. If yes then how can i achieve this? Which algorithm will help me? -- Thanks Regards:- Vikas Parashar Sr. Linux administrator Cum Developer Mobile: +91 958 208 8852 Email: vikas.paras...@fosteringlinglinux.com

Logistic Regression cost function

2014-01-13 Thread Frank Scholten
Hi, I followed the Coursera Machine Learning course quite a while ago and I am trying to find out how Mahout implements the Logistic Regression cost function in the code surrounding AbstractOnlineLogisticRegression. I am looking at the train method in AbstractOnlineLogisticRegression and I see

Re: Logistic Regression cost function

2014-01-13 Thread Suneel Marthi
Mahout's impl is based off of Leon Bottou's paper on this subject. I don't gave the link handy but it's referenced in the code or try google search Sent from my iPhone On Jan 13, 2014, at 7:14 AM, Frank Scholten fr...@frankscholten.nl wrote: Hi, I followed the Coursera Machine Learning

Re: travelling salesman on Mahout

2014-01-13 Thread Ted Dunning
On Mon, Jan 13, 2014 at 8:42 AM, Pavan K Narayanan pavan.naraya...@gmail.com wrote: Please may I ask why TSP has been removed from Mahout. It was the Genetic Algorithms that were removed. The implementation was unmaintained and not scalable and thus not appropriate for Mahout. Its just

Re: Logistic Regression cost function

2014-01-13 Thread Frank Scholten
Do you know which paper it is? He has quite a few publications. I don't see any mention of one of his papers in the code. I only see www.eecs.tufts.edu/~dsculley/papers/combined-ranking-and-regression.pdf in MixedGradient but this is something different. On Mon, Jan 13, 2014 at 1:27 PM, Suneel

RE: Logistic Regression cost function

2014-01-13 Thread Tim Smith
There is a link on http://mahout.apache.org/users/classification/logistic-regression.html to the following paper: http://www.autonlab.org/autonweb/14709/version/4/part/5/data/komarek:lr_thesis.pdf?branch=mainlanguage=en Date: Mon, 13 Jan 2014 20:58:28 +0100 Subject: Re: Logistic Regression

Re: Logistic Regression cost function

2014-01-13 Thread Ted Dunning
I think that this is the link in the code: http://leon.bottou.org/research/stochastic On Mon, Jan 13, 2014 at 11:58 AM, Frank Scholten fr...@frankscholten.nlwrote: Do you know which paper it is? He has quite a few publications. I don't see any mention of one of his papers in the code. I

Re: Logistic Regression cost function

2014-01-13 Thread Suneel Marthi
I think this is the one. Yes, I don't see this paper referenced in the code sorry about that. http://leon.bottou.org/publications/pdf/compstat-2010.pdf On Monday, January 13, 2014 3:51 PM, Frank Scholten fr...@frankscholten.nl wrote: Do you know which paper it is? He has quite a few

Re: Logistic Regression cost function

2014-01-13 Thread Suneel Marthi
Komarek paper is not what the implementation is based off of. I once was chided by Ted on the LinkedIn Mahout forums for quoting the Komarek paper (more than 2 years ago). Here's the link to leon bottou's paper - http://leon.bottou.org/publications/pdf/compstat-2010.pdf On Monday,

Re: Logistic Regression cost function

2014-01-13 Thread Ted Dunning
The reference is to the web site in general. If anything, this blog is closest: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.177.3514rep=rep1type=pdf On Mon, Jan 13, 2014 at 1:14 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: I think this is the one. Yes, I don't see this paper

Re: Logistic Regression cost function

2014-01-13 Thread Frank Scholten
Thanks guys, I have some reading to do :-) On Mon, Jan 13, 2014 at 10:45 PM, Ted Dunning ted.dunn...@gmail.com wrote: The reference is to the web site in general. If anything, this blog is closest: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.177.3514rep=rep1type=pdf On Mon,

Re: ArrayIndexOutOfBoundsException with mahout vectordump and cvb ?

2014-01-13 Thread Yang
Suneel: thanks for the reply (sorry my gmail somehow put the reply into archive so it didn't show up in my inbox) the dictionary seems ok, at least not empty. -sh-3.2$ ls -l sparse/ total 464 drwxr-xr-x 2 yyang15 gid-yyang15 32768 Jan 8 15:17 df-count -rw-r--r-- 1 yyang15 gid-yyang15 203369

Re: ArrayIndexOutOfBoundsException with mahout vectordump and cvb ?

2014-01-13 Thread Suneel Marthi
Does the dictionary have a Key 'zero'? On Monday, January 13, 2014 7:37 PM, Yang tedd...@gmail.com wrote: Suneel: thanks for the reply (sorry my gmail somehow put the reply into archive so it didn't show up in my inbox) the dictionary seems ok, at least not empty.  -sh-3.2$ ls -l  

Re: Query regarding Mahout's distributed Random Forest implementation

2014-01-13 Thread Som Satpathy
I got what I was looking for - https://issues.apache.org/jira/browse/MAHOUT-835 Thanks, Som On Thu, Jan 9, 2014 at 8:43 AM, Som Satpathy somsatpa...@gmail.com wrote: Hi all, In Mahout 0.8, the distributed Random Forest implementation doesn't seem to be computing the out of bag error while

Question about AbstractCluster

2014-01-13 Thread yunming zhang
Hi, I have a question about the implementation of the AbstractCluster, I wonder what are s0, s1 and s2 used for? There is no comments on what they represent. I would really appreciate it if someone could explain to me in the context of the KMeans application. Is S1 the old centroids and S2 the

Re: Item recommendation w/o users or preferences

2014-01-13 Thread Johannes Schulte
Hey, since you are already using basket analysis terms like support, confidence and lift it might be easier for you to think of the llr score as a better lift since it automatically puts a penalty on seldom items (you usually use support in classic mba for that). So, you would use the same 4