Re: AbstractJob class not found exception

2016-09-14 Thread Lee
unsubscribe On Tue, Sep 6, 2016 at 10:46 PM, Francois Bossiere < francois.bossi...@gmail.com> wrote: > Unsubscribe > -- Fangyuan Li Master Student at Department of Computer Science Stony Brook University Email: maplain...@gmail.com

Re: AbstractJob class not found exception

2016-08-16 Thread Lee S
mahout-mr 0.10.1 On Tue, Aug 16, 2016 at 9:12 PM, Suneel Marthi wrote: > Which Mahout version are u running? > > On Tue, Aug 16, 2016 at 7:10 AM, Lee S wrote: > > > I try to run local mahout job in my main function, > > > > but when exec

AbstractJob class not found exception

2016-08-16 Thread Lee S
I try to run local mahout job in my main function, but when execute it come out with exception: java.lang.NoClassDefFoundError: org/apache/mahout/common/AbstractJob at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:760) at java.security.Se

hmm prediction can not output normal character if predict binary

2016-06-21 Thread Lee Ho Yeung
i use yahoo's finance csv and change to binary number and separate each one and zero with a space and then save as hmm-input and run example command below however the output number are not exactly the future and even not a decimal number mahout/bin/mahout baumwelch -i hmm-input -o hmm-model -nh

which function and how to use to guess missing value in matrix like this svd?

2016-06-20 Thread Lee Ho Yeung
is there a svd function like this function to guess missing floating value in matrix? if there is this function, how to use this function and where is the result stored? import numpy as np from scipy.sparse.linalg import svds from functools import partial def emsvd(Y, k=None, tol=1E-3, maxiter=N

recommenditembased has error

2016-06-20 Thread Lee Ho Yeung
just try to guess value in a table or matrix firstly is that do not know where the result file is. secondly is that it seems have error vi rdata.txt 1,1,5 1,2,4 1,3,5 2,1,4 2,2,5 2,3,4 3,1,5 3,2,4 4,1,1 4,2,2 5,1,2 5,2,1 5,3,1 hadoop-2.7.2/bin/hadoop fs -rm -r temp mahout/bin/mahout recommendit

where is the result file after run parallelALS?

2016-06-20 Thread Lee Ho Yeung
i follow this example to guess missing value in matrix https://mahout.apache.org/users/recommender/intro-als-hadoop.html mahout/bin/mahout parallelALS --input /home/martin/Downloads/rdata.txt --output /home/martin/Downloads/output.txt --lambda 0.1 --implicitFeedback true --alpha 0.8 --numFeatures

Re: What's the mr item-based recommend algorithm essay?

2016-02-19 Thread Lee S
@Adi this link is for als algorithm, not the item-based implementation. On Fri, Feb 19, 2016 at 1:09 PM, Adi Haviv wrote: > collaborative filtering - > https://codeascraft.com/2014/11/17/personalized-recommendations-at-etsy/ > > On Fri, Feb 19, 2016 at 8:46 AM, Lee S wro

What's the mr item-based recommend algorithm essay?

2016-02-18 Thread Lee S
Hi: Does anybody know which paper the mr algorithm is based on?

Dose random forest support multiple input ?

2015-09-06 Thread Lee S
Hi all, I've read the document of mahout random forest at https://mahout.apache.org/users/classification/partial-implementation.html. In the "Know issues and limitations " section,it says > The "Decision Forest" code is still "a work in progress", many features > are still missing. Here is a li

Re: Mahout 0.10 with Spark 1.1.1

2015-07-08 Thread Kidong Lee
I have experienced submitting mahout spark job with yarn-client mode like this: bin/mahout spark-itemsimilarity --input /input/part-000 --output /output --maxSimilaritiesPerItem 20 --master yarn-client --sparkExecutorMem 8g -D:spark.driver.memory=5g -D:spark.driver.maxResultSize=3g -D:spark.execut

How can I make kmeans output cluster label starts with zero and consecutive?

2015-01-04 Thread Lee S
I have used kmeans in mahout. And I dumped the clusteredPoints directory, but the label starts with CL or VL , and the label number is not consecutive number. How can I make the cluster lable consecutive? p.s I've read the code of ClusterClassificatonDriver, I think for my need , change the code

Re: How can I include mahout 0.9 with hadoop 2.3 in my project?

2014-12-16 Thread Lee S
yep, I just read the pom.xml carefully, you are right, -Dhadoop2 is redundant. 2014-12-16 21:24 GMT+08:00 Gokhan Capan : > > I believe -Dhadoop2 is also redundant. > > mvn clean install -Dhadoop.version=2.3.0 should be sufficient > > Sent from my iPhone > > > On De

Re: How can I include mahout 0.9 with hadoop 2.3 in my project?

2014-12-16 Thread Lee S
Hi all , I have figured this out. The command should be (mvn clean install -Dhadoop2 -Dhadoop.version=2.3.0 -DskipTests=true ). Because (2.2.0) is in the pom.xml, not hadoop2.version. Hope this can help somebody who meets the same problem. 2014-12-16 15:49 GMT+08:00 Lee S : > > I compiled

Re: How can I include mahout 0.9 with hadoop 2.3 in my project?

2014-12-15 Thread Lee S
tiranjan panda wrote: > > Hi, > > mahout-0.9 is compatible with hadoop-1.2.1 > > > > Regards > > Jyoti Ranjan Panda > > > > On Mon, Dec 15, 2014 at 2:33 PM, Lee S wrote: > >> > >> Hi all: > >> I use gradle to management depen

How can I include mahout 0.9 with hadoop 2.3 in my project?

2014-12-15 Thread Lee S
Hi all: I use gradle to management dependencies in my project. dependencies { compile 'org.apache.mahout:mahout-core:0.9' } When gradle build , mahout with hadoop 1.2.1 will be downloaded. Do I need to compile mahout with hadoop 2.3.0 and then include it into my project locally?

Re: How to deal with catogrical and date data in mahout ?

2014-11-18 Thread Lee S
tions you could > create a mapping of your categorical data to integers before you pass the > data into Mahout. > > Let us know a bit more about what you're trying to accomplish/algos you're > looking to use. > > Best, > Nick > > -Original Message---

How to deal with catogrical and date data in mahout ?

2014-11-18 Thread Lee S
Hi all: Do you hava any good practice when you deal with catogrical data? Does mahout have provided a tool class which can do the convertion?

Re: Why do most algorithms use sequencefile as input and output?

2014-11-06 Thread Lee S
any other reasons or can you give a thorough analysis? 2014-11-05 11:00 GMT+08:00 Ted Dunning : > > Yes, type conversion is a reason. > > Sent from my iPhone > > > On Nov 4, 2014, at 18:59, Lee S wrote: > > > > eg. kmeans input: > > 1,2,3,4 //text fi

Re: Why do most algorithms use sequencefile as input and output?

2014-11-04 Thread Lee S
-11-04 23:56 GMT+08:00 Ted Dunning : > What should the input be? > > > > On Tue, Nov 4, 2014 at 12:28 AM, Lee S wrote: > > > Hi all: > > I'm wondering why the input and output of most algorithm like > > kmeans,naivebayes are all sequencefiles. One more

Why do most algorithms use sequencefile as input and output?

2014-11-04 Thread Lee S
Hi all: I'm wondering why the input and output of most algorithm like kmeans,naivebayes are all sequencefiles. One more step of conversion need to be done if we want the algorithm works.And I think the step is time consuming. Because it's also a mapreduce job. For the reason to deal with small

Re: Mahout Vs Spark

2014-10-21 Thread Lee S
As a developer, who is facing the library chosen between mahout and mllib, I have some idea below. Mahout has no any decision tree algorithm. But MLLIB has the components of constructing a decision tree algorithm such as gini index, information gain. And also I think mahout can add algorithm abou

Re: How to use naivebayes on ordinary data not on text files?

2014-10-20 Thread Lee S
For example, one line of data file is like this: 1 3 4 5 6 7 first column is label. Other columns construct the feature vector 2014-10-21 11:17 GMT+08:00 Vibhanshu Prasad : > Ordinary files? > What type of file you are using? > > On Mon, Oct 20, 2014 at 7:44 AM, Lee S wrote: >

How to use naivebayes on ordinary data not on text files?

2014-10-19 Thread Lee S
I hava an ordinary data file containing labels and feature vectors. How can I use naivebayes to classify it? The example on the official website is used with text files. Can it be used on ordinary files? I wonder if *trainnb* can be directly used on data files only if the format of data file is o

Why do seqdumper and clusterdumper poduce output in local disk?

2014-10-19 Thread Lee S
When I run the two commands in hadoop mode , the output are all produced in the disk. Why is the ouput in the hdfs in hadoop mode to perserve a consistence?

Best algorithm for "People who viewed this also viewed" scenario (aka, no preference values)?

2014-08-18 Thread Sigmund Lee
I used to using Mahout's Log-likelihood and Tanimoto coefficient as similarity algo for this scenario, but the results was not so good. So I wondering if there are another algos that can be used to fit this scenario better? For example, co-occurrences matrix that introduced in Mahout In Action? T

Re: Any Entity Resolution & Deduplication solution?

2013-12-06 Thread Jason Lee
ntities you might want to look into GATE. > http://gate.ac.uk/sale/talks/stupidpoint/diana-fb.ppt‎ > > > Hope that helps > Manuel > > On 03.12.2013, at 09:41, Jason Lee wrote: > > > I have 10M+ textual company names(in Chinese) that extracted from work > > exper

Any Entity Resolution & Deduplication solution?

2013-12-03 Thread Jason Lee
I have 10M+ textual company names(in Chinese) that extracted from work experiences of user's profile. Because those company names are manually entered by users of our site, so there are lots of duplication. Our goal is extracting & cleansing those data to establish a company dictionary. For example

Re: Mahout fpg

2013-11-23 Thread Jason Lee
Hi suneel, thank you for the clarification. On Nov 22, 2013 9:25 PM, "Suneel Marthi" wrote: > > > > > > On Friday, November 22, 2013 4:55 AM, Jason Lee wrote: > > I noticed lots of algorithms implementations has deprecated in Mahout 0.8 > and removed in

Re: Mahout fpg

2013-11-22 Thread Jason Lee
I noticed lots of algorithms implementations has deprecated in Mahout 0.8 and removed in 0.9, but no reasons or comments been marked. Can i ask why? Btw, Mahout API is a little lack javadoc comments, every contributors of Mahout should has the responsibility to add more javadoc comments to the ja

Re: Has anyone implemented "true" L-LDA out of Mahout?

2013-09-18 Thread Henry Lee
he # of topics? topic 0: {0: 3+6+(3+6)/2, 1: 1+2+(1+2)/2, 2: (3+6)/2, 3: (1+2)/2 } topic 1: {0: (3+6)/2, 1: (1+2)/2, 2: 3+6+(3+6)/2, 3: 1+2+(1+2)/2 } Any advice will be highly appreciated. Thanks, Henry Lee. On Thu, Sep 5, 2013 at 6:45 PM, Henry Lee wrote: > Thanks for your help in advance

Re: Has anyone implemented "true" L-LDA out of Mahout?

2013-09-05 Thread Henry Lee
Thanks for your help in advance. I will have such a good data set within 2 weeks or so. I may have a working impl. by the end of next week or so. Thanks, Henry Lee. On Thu, Sep 5, 2013 at 1:50 PM, Jake Mannix wrote: > Nobody's talked to me about it either. > > I'm happy

Has anyone implemented "true" L-LDA out of Mahout?

2013-09-05 Thread Henry Lee
stering/lda/cvb/CVB0PriorMapper.java Thanks, Henry Lee

Questions about LDA CVB TopicModel class usage for inferring new docs to topics.

2013-07-26 Thread Henry Lee
kind of task? Thanks, Henry Lee. See my code below: @Testpublic void testOfJakeMannixIdeaAndQuestions() { // jake.man...@gmail.com val conf = new Configuration(); val dictionary = readDictionary(new Path("/tmp/dictionary.file-0"), conf); assertThat(dictionary.length

Re: churn analysis

2013-07-25 Thread Lee, Howon
On that subject, does anyone have any resources re: feature engineering for churn analysis? On Thu, Jul 25, 2013 at 4:12 AM, Sayed Seliman wrote: > Hi, > > mahout is a customer requirement. > Can I use the logistic regression with Mahout ? > How I have to prepare my data to be processed with the

Implement LinkedIn's PYMK(People You May Know) feature use Mahout, any suggestions?

2013-07-23 Thread Jason Lee
Hi all, Currently i am working on recommendation system in a SNS site. There are 15M+ registered members in our site. We already have a PYMK implementation(not use mahout or any machine learning algorithms libs), but the accuracy of recommend results produced by current implementation is not as go

Re: Issue when running Mahout Recommender Demo

2013-07-23 Thread Jason Lee
etty-based demo is still working or in the > > project though. If so it should just be deleted. > > > > On Fri, Jul 19, 2013 at 4:21 AM, Jason Lee wrote: > >> Hi, guys, > >> > >> I was trying to following the doc > >> below: > https:/

Issue when running Mahout Recommender Demo

2013-07-18 Thread Jason Lee
Hi, guys, I was trying to following the doc below:https://cwiki.apache.org/confluence/display/MAHOUT/Recommender+Documentation When I run jetty:run under *mahout-integration*, I am getting a ClassNotFoundException: org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommender. I noticed that

Keeping track of revisions of models?

2013-07-17 Thread Lee, Howon
Hey, I'm planning to make some sgd logistic regression models, serialize them to save them and test my programs with these models. It seems pretty terrible to check them into my version control, because they're binaries. Is there a good way to keep track of versions of my models, revert them, etc

setting the number of reduce jobs for FPGrowth

2013-03-26 Thread ricky lee
Hi, I saw some similar questions in this mailing list but could not find a clear answer yet. With fairly large dataset (330G), the FPGrowth takes most of time in the parallel-fpgrowth Reduce tasks, can I set the number of Reduce jobs automatically? In my default Hadoop installation, the number of

Re: Error running RecommenderJob using mahout-core-0.5-cdh3u4-job.jar

2012-09-06 Thread Lee Carroll
-Dmapred.output.dir=/user/etl_user/itemreccooutput should that be -Dmapred.output.dir=/user/etl_user/itemrecco/output On 6 September 2012 02:40, tmefrt wrote: > > Hi All > > I'm trying to test the item recommendation. using the command > > > hadoop jar /usr/lib/mahout/mahout-core-0.5-cdh3u4-job.

Re: Significant - serendipity in recommending

2012-03-24 Thread Lee Carroll
predicting the intent of the user when they intend something other than they want. good luck :-) On 24 March 2012 17:00, Ted Dunning wrote: > I don't know what you mean by significant any more than Sean. > > But serendipity in a recommender comes from two sources.  Both must be > present.  One s

mahout seq2sparse --minDF option

2012-03-13 Thread Lee, Joo-Young
Hi all. I use "mahout seq2sparse" with -md and -x option to remove low frequency word and very high frequency word. However, the generated dictionary.file-0 is always same when I change the number of -md and -x option. are these options working correctly?

Re: Add on to itemsimilarity

2012-01-30 Thread Lee Carroll
ut for to make it better ? What's your secret Ted! Lee C

Re: Add on to itemsimilarity

2012-01-30 Thread Lee Carroll
t combined list.  This design intrudes a lot > less into Mahout's internals. > > Would anyone else benefit from this addition? > > > On 01/29/2012 12:33 AM, Ted Dunning wrote: >> >> Also, Lee, I think you have it backwards.  It is true that clicks are not >>

Re: Add on to itemsimilarity

2012-01-28 Thread Lee Carroll
> I would argue, though, that .recommend() is aimed at the latter task: No . I think the mismatch here is you are using at best a wild guess at a preference for the convenience of using a recommender and then in the same breath expecting the recommender to "understand" that you are not using prefe

Re: Sequential Pattern Mining

2011-11-27 Thread Lee Carroll
for item / users > ... However I know its not your original focus of the your question so maybe theiris a much better way lee c On 27 November 2011 10:17, Nishant Chandra wrote: > Are you talking about CF? Can you please explain a bit? > > To be clear, for my use case, temporal sequence is i

Does mahout0.5 have a build dependancy on the build machine having access to the internet

2011-10-27 Thread lee carroll
va:150) at org.apache.mahout.ga.watchmaker.cd.FileInfosDatasetTest.testRanges(FileInfosDatasetTest.java:36) at cheers lee c

Re: Average Absolute Difference Recommender Evaluator metric

2011-10-26 Thread lee carroll
I've just re-read section 4.2 exploring the user-based recommender - and the role of the similarity measure is their, front and centre! cheers lee c On 26 October 2011 12:39, Sean Owen wrote: > A-ha. I should elaborate then. The essence of the item-based algorithm > is estimati

Re: Average Absolute Difference Recommender Evaluator metric

2011-10-26 Thread lee carroll
Yes, precision/recall and f-measure and fall out depend on a notion of > "relevant" or "correct" results and this is a bit problematic in this > context. > > A/B testing is the ultimate test, yes. But these evaluations you're > running here do have value. > >

Re: cold-start and attribute based ItemSimilarity implementation

2011-10-26 Thread lee carroll
mes colour is key some times size is etc etc. By using solr with mlt and edismax etc you may stand a better chance of making a more effective, more maintainable solution. get the book though as the custom item similarity is great stuff. cheers lee c On 26 October 2011 10:15, Sean Owen

Re: Average Absolute Difference Recommender Evaluator metric

2011-10-26 Thread lee carroll
hell. (It also involves a wide selection of stake holders and potential metrics which in my experience guarantees the results to be gerrymandered) Anyway I digress. Thanks for every ones help. Cheers Lee C can only come from known

Re: Average Absolute Difference Recommender Evaluator metric

2011-10-25 Thread lee carroll
of the recommender vivid and concrete. The confidence this creates is not to be under-estimated. However how do I describe to a business stake holder the meaning of a tanimoto produced AAD? I can't at the moment :-) cheers Lee C

Re: Average Absolute Difference Recommender Evaluator metric

2011-10-25 Thread lee carroll
On 25 October 2011 20:55, lee carroll wrote: > I've not come across the terms boolean / non boolean recommenders > before. I thought they all worked by > estimating preferences. > > > > On 25 October 2011 19:13, Sean Owen wrote: >> You should be able to c

Re: Average Absolute Difference Recommender Evaluator metric

2011-10-25 Thread lee carroll
stimating preferences. > > But it's not meaningful for any comparison, for the rest. > > On Tue, Oct 25, 2011 at 7:04 PM, lee carroll > wrote: >> So when comparing within a technique AAD or RMS is fine but when comparing >> across recommenders using a variety of similarities its best to stick >> to IR measures. >

Re: Average Absolute Difference Recommender Evaluator metric

2011-10-25 Thread lee carroll
> > On Tue, Oct 25, 2011 at 6:50 PM, lee carroll > wrote: >> What does the metric returned by >> AverageAbsoluteDifferenceRecommenderEvaluator mean for non rating >> based recommenders. >> >> The Mahout in action book describes the metric as being the am

Average Absolute Difference Recommender Evaluator metric

2011-10-25 Thread lee carroll
ng? do I have a to simplistic view of the metric of AAD? Thanks in advance Lee C

Re: LanczosSolver and ClassNotFoundException

2011-02-21 Thread Kidong Lee
I think, you should add all the dependency jars except hadoop-*.jar in lib of mahout distribution to your M/R job lib. I have also experienced something similiar to your case. At my case, I have put all the lucene jars of mahout dist. to my M/R job lib, and then no such ClassNotFoundException occur

Re: How to get the Id List of items which belong to a cluster.

2011-02-15 Thread Kidong Lee
er called clusteredPoints in the output directory having a sequence > file > with mappings > > Robin > > On Tue, Feb 15, 2011 at 6:02 AM, Kidong Lee wrote: > > > Hi, > > > > My situation is almost like '12.1 Finding similar users on Twitter' in > > Mahout

How to get the Id List of items which belong to a cluster.

2011-02-14 Thread Kidong Lee
Hi, My situation is almost like '12.1 Finding similar users on Twitter' in Mahout in action book. In my document, there are lists of item id and its contents seperated by delimiter comma, for example like this CSV file(itemId, itemContents): 1223, sports 1344, football nike ... First I did conve