Hi,
I already have mahout in action, but nothing working with mahout last version..
I will see again..
For taming text does it treat .xml, json files too, cause my goal is to take
the output of solr (which is .xml, json or php)?
Regards
-Message d'origine-
De : Lance Norskog
what should be the input format for mahout??? can anybody tell me.. I'm
confused.. I'm not able to make head or tail out of the output that I'm
getting
--
View this message in context:
That's a very good question, I was expecting an answer too...
That was the answer giver to me from mahout users:
the type of input and output depends on the job you want to run.
I was clustering .txt files for the moment.
-Message d'origine-
De : shriram [mailto:ghai12...@gmail.com]
Here is a quick walkthrough for doing kmeans clustering and looking at
the input and output.
https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line
Be aware that some command line params have changed since it was written
for 0.6. For
Hi ,
Sorry for my late response.
Thanks Dmitry and Ted for your suggestions about smaller value of k and
statistical noise.
I have some knowledge about the problem I am dealing with and that’s why I
expected that.
It is like this: there are some inherent groups (clusters) in my dataset and
I bought the online book Mahout in Action and have been reading through it,
trying to follow along with the steps when possible. I am new to this whole
process, including writing code in general. I have now download the latest
Mahout, the latest Maven, and the Eclipse IDE. I am trying to
Hi,
I have a couple of questions regarding Naive Bayes classification in
Mahout 0.7.
Is there a preferred way to determine when a document doesn't belong
to any of the given categories? Currently, I'm trying to do this by
explicitly having an Other category and including large numbers of
Solr creates Lucene index files. You can query it for content in
several formats. You will have to fetch the data with a program.
bin/mahout lucene.vector
creates vector sequencefiles from a lucene index. I have not tried
this. You have to configure Solr to create termvectors for the field
you