Hi,
I am following the Apache Mahout Cookbook tutorials and I tried to run a
classifier on 20newsGroup.I managed to convert the files in SF then run the
TF-IDF algorithm, and split the Data into Train/test. But when I finally
build my model with trainnb, here is the error I got.
Did this already
I solved my problem, I didnt split the right file.
2014-08-07 10:28 GMT+02:00 xenlee - Zerg sc2.xen...@gmail.com:
Hi,
I am following the Apache Mahout Cookbook tutorials and I tried to run a
classifier on 20newsGroup.I managed to convert the files in SF then run the
TF-IDF algorithm, and
Ok nice in case you have more problem pls do not hesitate to ask me
Piero Giacomelli
2014-08-07 11:29 GMT+02:00 xenlee - Zerg sc2.xen...@gmail.com:
I solved my problem, I didnt split the right file.
2014-08-07 10:28 GMT+02:00 xenlee - Zerg sc2.xen...@gmail.com:
Hi,
I am following
Ok, I did a number of re-factorings and one among them, that blew my mind.
As you may or may not know this:
.reduceByKey(_ + _, 40) // do it with 40 partitions
cause spark to partition data into 40 parts with default, as I
understand, hash-based partitioner. Then during execution these
Hello,
I am interested in benchmarking Mahout on different hardware/software
platforms, and I am looking for (real/synthetic) dataset (ideally between
tens of GBs to couple of TBs).
I am particularly interested in the K-means, (naive) Bayesian Network and
Collaborative Filtering (ALS-WR)
All,
I'm having a tough time adding a custom analyzer to Hadoop and making use
of it through Mahout.
I've pruned down the Mahout in Action examples to a sole example which is a
customized Mahout 0.9 MailArchivesClusteringAnalyzer in