Re: AbstractJob class not found exception

2016-08-16 Thread Lee S
mahout-mr 0.10.1 On Tue, Aug 16, 2016 at 9:12 PM, Suneel Marthi <smar...@apache.org> wrote: > Which Mahout version are u running? > > On Tue, Aug 16, 2016 at 7:10 AM, Lee S <sle...@gmail.com> wrote: > > > I try to run local mahout job in my main function, > &

AbstractJob class not found exception

2016-08-16 Thread Lee S
I try to run local mahout job in my main function, but when execute it come out with exception: java.lang.NoClassDefFoundError: org/apache/mahout/common/AbstractJob at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:760) at

Re: What's the mr item-based recommend algorithm essay?

2016-02-19 Thread Lee S
@Adi this link is for als algorithm, not the item-based implementation. On Fri, Feb 19, 2016 at 1:09 PM, Adi Haviv <adiha...@gmail.com> wrote: > collaborative filtering - > https://codeascraft.com/2014/11/17/personalized-recommendations-at-etsy/ > > On Fri, Feb 19, 2016 at 8

What's the mr item-based recommend algorithm essay?

2016-02-18 Thread Lee S
Hi: Does anybody know which paper the mr algorithm is based on?

Dose random forest support multiple input ?

2015-09-06 Thread Lee S
Hi all, I've read the document of mahout random forest at https://mahout.apache.org/users/classification/partial-implementation.html. In the "Know issues and limitations " section,it says > The "Decision Forest" code is still "a work in progress", many features > are still missing. Here is a

How can I make kmeans output cluster label starts with zero and consecutive?

2015-01-04 Thread Lee S
I have used kmeans in mahout. And I dumped the clusteredPoints directory, but the label starts with CL or VL , and the label number is not consecutive number. How can I make the cluster lable consecutive? p.s I've read the code of ClusterClassificatonDriver, I think for my need , change the code

Re: How can I include mahout 0.9 with hadoop 2.3 in my project?

2014-12-16 Thread Lee S
Lee S sle...@gmail.com: I compiled mahout with hadop 2.3.0 as following: 1. get the trunck code git clone https://github.com/apache/mahout.git 2. mvn clean install -Dhadoop2 -Dhadoop2.version=2.3.0 -DskipTests=true and all build is successful. 3. find hadoop in code base

Re: How can I include mahout 0.9 with hadoop 2.3 in my project?

2014-12-16 Thread Lee S
, Lee S sle...@gmail.com wrote: Hi all , I have figured this out. The command should be (mvn clean install -Dhadoop2 -Dhadoop.version=2.3.0 -DskipTests=true ). Because (hadoop.version2.2.0/hadoop.version) is in the pom.xml, not hadoop2.version. Hope this can help somebody who meets

How can I include mahout 0.9 with hadoop 2.3 in my project?

2014-12-15 Thread Lee S
Hi all: I use gradle to management dependencies in my project. dependencies { compile 'org.apache.mahout:mahout-core:0.9' } When gradle build , mahout with hadoop 1.2.1 will be downloaded. Do I need to compile mahout with hadoop 2.3.0 and then include it into my project locally?

How to deal with catogrical and date data in mahout ?

2014-11-18 Thread Lee S
Hi all: Do you hava any good practice when you deal with catogrical data? Does mahout have provided a tool class which can do the convertion?

Re: How to deal with catogrical and date data in mahout ?

2014-11-18 Thread Lee S
, for recommendations you could create a mapping of your categorical data to integers before you pass the data into Mahout. Let us know a bit more about what you're trying to accomplish/algos you're looking to use. Best, Nick -Original Message- From: Lee S [mailto:sle...@gmail.com] Sent

Re: Why do most algorithms use sequencefile as input and output?

2014-11-06 Thread Lee S
any other reasons or can you give a thorough analysis? 2014-11-05 11:00 GMT+08:00 Ted Dunning ted.dunn...@gmail.com: Yes, type conversion is a reason. Sent from my iPhone On Nov 4, 2014, at 18:59, Lee S sle...@gmail.com wrote: eg. kmeans input: 1,2,3,4 //text file kmeans output

Why do most algorithms use sequencefile as input and output?

2014-11-04 Thread Lee S
Hi all: I'm wondering why the input and output of most algorithm like kmeans,naivebayes are all sequencefiles. One more step of conversion need to be done if we want the algorithm works.And I think the step is time consuming. Because it's also a mapreduce job. For the reason to deal with small

Re: Why do most algorithms use sequencefile as input and output?

2014-11-04 Thread Lee S
-11-04 23:56 GMT+08:00 Ted Dunning ted.dunn...@gmail.com: What should the input be? On Tue, Nov 4, 2014 at 12:28 AM, Lee S sle...@gmail.com wrote: Hi all: I'm wondering why the input and output of most algorithm like kmeans,naivebayes are all sequencefiles. One more step of conversion

Re: Mahout Vs Spark

2014-10-21 Thread Lee S
As a developer, who is facing the library chosen between mahout and mllib, I have some idea below. Mahout has no any decision tree algorithm. But MLLIB has the components of constructing a decision tree algorithm such as gini index, information gain. And also I think mahout can add algorithm

Re: How to use naivebayes on ordinary data not on text files?

2014-10-20 Thread Lee S
For example, one line of data file is like this: 1 3 4 5 6 7 first column is label. Other columns construct the feature vector 2014-10-21 11:17 GMT+08:00 Vibhanshu Prasad vibhanshugs...@gmail.com: Ordinary files? What type of file you are using? On Mon, Oct 20, 2014 at 7:44 AM, Lee S sle

Why do seqdumper and clusterdumper poduce output in local disk?

2014-10-19 Thread Lee S
When I run the two commands in hadoop mode , the output are all produced in the disk. Why is the ouput in the hdfs in hadoop mode to perserve a consistence?

How to use naivebayes on ordinary data not on text files?

2014-10-19 Thread Lee S
I hava an ordinary data file containing labels and feature vectors. How can I use naivebayes to classify it? The example on the official website is used with text files. Can it be used on ordinary files? I wonder if *trainnb* can be directly used on data files only if the format of data file is