RE: api and documentation

2014-12-01 Thread ARROYO MANCEBO David
I want to code a user based recommender :) -Mensaje original- De: Pat Ferrel [mailto:p...@occamsmachete.com] Enviado el: viernes, 28 de noviembre de 2014 17:51 Para: user@mahout.apache.org Asunto: Re: api and documentation You may not need the API since several methods are supported

20 news groups example

2014-12-01 Thread Jakub Stransky
Hello experienced mahout users, I am new to mahout and I am trying to run naive bayes classification example with 20news groups categories. I do not userstand one thing which I am unable to spot. To train categorization I need a labeled data. I don't see the way how the label of a particular

Insights to Naive Bayes classifier example - 20news groups

2014-12-01 Thread Jakub Stransky
Hello Mahout experts, I am trying to follow some examples provided with Mahout and some features are not clear to me. It would be great if someone could clarify a bit more. To prepare a the data (train and test) the following sequence of steps is perfomed (taken from mahout cookbook): All input

RE: Insights to Naive Bayes classifier example - 20news groups

2014-12-01 Thread Andrew Palumbo
Hi Jakub, The step that you are missing is `$mahout seqdir ...`. in this step each file in each directory (where the directory is the Category) is converted into a sequence file of form Text,Text where the Text key is /Category/doc_id. `$mahout seq2sparse ...` vectorizes the output of

RE: Insights to Naive Bayes classifier example - 20news groups

2014-12-01 Thread Andrew Palumbo
All input is merged into single dir: *cp -R ${WORK_DIR}/20news-bydate*/*/* ${WORK_DIR}/20news-all* as well the above line should read as follows. $ cp -R ${WORK_DIR}/20news-bydate/*/* ${WORK_DIR}/20news-all see: http://mahout.apache.org/users/classification/twenty-newsgroups.html

mahout ssvd usage

2014-12-01 Thread debbie
I am trying to use mahout's stochastic svd algorithm on a small dataset to compare it with the regular svd algorithm (DistributedLanczos).I built the covariance matrix for the dataset and I feed it to mahout svd which returns a cleaneigenvectors file with the eigenvectors of this covariance

Re: Insights to Naive Bayes classifier example - 20news groups

2014-12-01 Thread Jakub Stransky
Hi Andrew, thanks for your response which points me to the missing piece of the puzzle! However there is still something which is not clear to me. Either to me it seems that the sequence of the commands is not correct or I haven't fully grasped the elementary mechanics here. I understand the

RE: Insights to Naive Bayes classifier example - 20news groups

2014-12-01 Thread Andrew Palumbo
However the sequence of steps as described in Mahout Cookbook seems to me incorrect as: this is entirely possible, that book may be out of date. The end to end instructions on the website for the 20 newsgroups example is up to date though. As is the example script. You don't want to

Re: DBSCAN implementation in Mahout

2014-12-01 Thread Dmitriy Lyubimov
if memory serves me, DeLiClu (density-link) is current best density thing since it does not require parameter searches. What is parallelization strategy you are proposing? I know there were a bunch of attempts to parallelize/partition the dbscan problem, one of more interesting is perhaps of