Re: Mahout for text classification

2013-12-11 Thread tuku
I am currently using naive bayes for text classification. I prefer NB over SVM because; - SVM has long training time - NB can be incremental - NB can be fully parallel the main decisions you should make while using NB is using tf or tfidf and using binary NB or multinomial if you classify short

Re: Avoiding OOM for large datasets

2013-12-11 Thread Ted Dunning
This is not right. THe sequential version would have finished long before this for any reasonable value of k. I do note, however, that you have set k = 200,000 where you only have 300,000 documents. Depending on which value you set (I don't have the code handy), this may actually be increased

Re: Mahout and Hadoop 2.2.0

2013-12-11 Thread Zoltan Prekopcsak
Hi Gokhan, Thank you for the clarification. Does it mean that Mahout is using the mapred API everywhere and there is no mapreduce API left? As far as I know, the mapreduce API needs to be recompiled and I remember needing to recompile Mahout for CDH4 when it first came out. Thanks, Zoltan

Re: Mahout and Hadoop 2.2.0

2013-12-11 Thread Suneel Marthi
Mahout is using the newer mapreduce API and not the older mapred API. Was that what u were looking for? On Wednesday, December 11, 2013 1:53 PM, Zoltan Prekopcsak preko1...@gmail.com wrote: Hi Gokhan, Thank you for the clarification. Does it mean that Mahout is using the mapred API

Re: Mahout and Hadoop 2.2.0

2013-12-11 Thread Sebastian Schelter
I think there are still parts of the code (e.g. in DistributedRowMatrix) that use the old API. --sebastian On 11.12.2013 19:56, Suneel Marthi wrote: Mahout is using the newer mapreduce API and not the older mapred API. Was that what u were looking for? On Wednesday, December 11,

Re: Mahout and Hadoop 2.2.0

2013-12-11 Thread Suneel Marthi
Sebastian, R we still using SplitInputJob, seems like its been replaced by a much newer SplitInput. Do u think this needs to be purged from the codebase for 0.9, its been marked as deprecated anyways? On Wednesday, December 11, 2013 2:08 PM, Suneel Marthi suneel_mar...@yahoo.com wrote:

Re: Mahout and Hadoop 2.2.0

2013-12-11 Thread Gokhan Capan
Hi Zoltan, I am saying that hadoop2-stable and hadoop1 are binary compatible. I don't know what version of hadoop is used in cdh4-mr2 but I guess it was hadoop2 alpha, since bigtop was at hadoop 2.0.6 alpha last time I checked, which was last week. Just try it and let us know if you experience

Re: Mahout and Hadoop 2.2.0

2013-12-11 Thread Gokhan Capan
Could you check the following? Are you sure that your hadoop cluster is hadoop 2.2.0? Are you sure other dependencies of your project do not have a transitive dependency to hadoop? Gokhan On Wed, Dec 11, 2013 at 9:46 PM, Hi There srudamas...@yahoo.com wrote: I tried to run

Re: Mahout and Hadoop 2.2.0

2013-12-11 Thread Suneel Marthi
Per this link, one notability incompatibility is Counter and CounterGroup. http://hadoop.apache.org/docs/r2.2.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html On Wednesday, December 11, 2013 2:46 PM, Hi There srudamas...@yahoo.com wrote: I

Re: Mahout and Hadoop 2.2.0

2013-12-11 Thread Hi There
Here are the full contents of my pom file: project xmlns=http://maven.apache.org/POM/4.0.0; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;   xsi:schemaLocation=http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;   modelVersion4.0.0/modelVersion  

Re: Mahout and Hadoop 2.2.0

2013-12-11 Thread Gokhan Capan
In the meantime, you might apply the patch in MAHOUT-1354, build mahout using mvn package -Phadoop2 -DskipTests=true, use that mahout version and see if that works Gokhan On Wed, Dec 11, 2013 at 10:09 PM, Gokhan Capan gkhn...@gmail.com wrote: I apologize, Suneel is right, Counter breaks the

Desicion Tree in Mahout

2013-12-11 Thread unmesha sreeveni
Am i able to run `Decision tree` from mahout in Eclipse without installing. Should i `install` Mahout in my system or download all `jar` dependencies and include them in lib. I want to Know the working of Decision Tree. Where can i find the `source code` for Mahout Decision tree. -- *Thanks