RE: .txt to vector

2012-07-20 Thread Videnova, Svetlana
Hi, I already have mahout in action, but nothing working with mahout last version.. I will see again.. For taming text does it treat .xml, json files too, cause my goal is to take the output of solr (which is .xml, json or php)? Regards -Message d'origine- De : Lance Norskog

RE: k-means output missing some cluster centers coordinates

2012-07-20 Thread shriram
what should be the input format for mahout??? can anybody tell me.. I'm confused.. I'm not able to make head or tail out of the output that I'm getting -- View this message in context:

RE: k-means output missing some cluster centers coordinates

2012-07-20 Thread Videnova, Svetlana
That's a very good question, I was expecting an answer too... That was the answer giver to me from mahout users: the type of input and output depends on the job you want to run. I was clustering .txt files for the moment. -Message d'origine- De : shriram [mailto:ghai12...@gmail.com]

Re: k-means output missing some cluster centers coordinates

2012-07-20 Thread Pat Ferrel
Here is a quick walkthrough for doing kmeans clustering and looking at the input and output. https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line Be aware that some command line params have changed since it was written for 0.6. For

RE: eigendecomposition of very large matrices

2012-07-20 Thread Aniruddha Basak
Hi , Sorry for my late response. Thanks Dmitry and Ted for your suggestions about smaller value of k and statistical noise. I have some knowledge about the problem I am dealing with and that’s why I expected that. It is like this: there are some inherent groups (clusters) in my dataset and

Installing Mahout through Eclipse IDE

2012-07-20 Thread David Rose
I bought the online book Mahout in Action and have been reading through it, trying to follow along with the steps when possible. I am new to this whole process, including writing code in general. I have now download the latest Mahout, the latest Maven, and the Eclipse IDE. I am trying to

Naive Bayes classification questions

2012-07-20 Thread David Engel
Hi, I have a couple of questions regarding Naive Bayes classification in Mahout 0.7. Is there a preferred way to determine when a document doesn't belong to any of the given categories? Currently, I'm trying to do this by explicitly having an Other category and including large numbers of

Re: .txt to vector

2012-07-20 Thread Lance Norskog
Solr creates Lucene index files. You can query it for content in several formats. You will have to fetch the data with a program. bin/mahout lucene.vector creates vector sequencefiles from a lucene index. I have not tried this. You have to configure Solr to create termvectors for the field you