Problem with Mahout Text Classifier following Apache Mahout Cookbook examples

2014-08-07 Thread xenlee - Zerg
Hi, I am following the Apache Mahout Cookbook tutorials and I tried to run a classifier on 20newsGroup.I managed to convert the files in SF then run the TF-IDF algorithm, and split the Data into Train/test. But when I finally build my model with trainnb, here is the error I got. Did this already

Re: Problem with Mahout Text Classifier following Apache Mahout Cookbook examples

2014-08-07 Thread xenlee - Zerg
I solved my problem, I didnt split the right file. 2014-08-07 10:28 GMT+02:00 xenlee - Zerg sc2.xen...@gmail.com: Hi, I am following the Apache Mahout Cookbook tutorials and I tried to run a classifier on 20newsGroup.I managed to convert the files in SF then run the TF-IDF algorithm, and

Re: Problem with Mahout Text Classifier following Apache Mahout Cookbook examples

2014-08-07 Thread Piero Giacomelli
Ok nice in case you have more problem pls do not hesitate to ask me Piero Giacomelli 2014-08-07 11:29 GMT+02:00 xenlee - Zerg sc2.xen...@gmail.com: I solved my problem, I didnt split the right file. 2014-08-07 10:28 GMT+02:00 xenlee - Zerg sc2.xen...@gmail.com: Hi, I am following

Re: RowSimilarityJob implementation with Spark

2014-08-07 Thread Reinis Vicups
Ok, I did a number of re-factorings and one among them, that blew my mind. As you may or may not know this: .reduceByKey(_ + _, 40) // do it with 40 partitions cause spark to partition data into 40 parts with default, as I understand, hash-based partitioner. Then during execution these

where to find representative workload to benchmark mahout

2014-08-07 Thread Wei Zhang
Hello, I am interested in benchmarking Mahout on different hardware/software platforms, and I am looking for (real/synthetic) dataset (ideally between tens of GBs to couple of TBs). I am particularly interested in the K-means, (naive) Bayesian Network and Collaborative Filtering (ALS-WR)

Difficulties adding a custom job (analyzer) to Hadoop

2014-08-07 Thread Mohammed Omer
All, I'm having a tough time adding a custom analyzer to Hadoop and making use of it through Mahout. I've pruned down the Mahout in Action examples to a sole example which is a customized Mahout 0.9 MailArchivesClusteringAnalyzer in