I want to code a user based recommender :)
-Mensaje original-
De: Pat Ferrel [mailto:p...@occamsmachete.com]
Enviado el: viernes, 28 de noviembre de 2014 17:51
Para: user@mahout.apache.org
Asunto: Re: api and documentation
You may not need the API since several methods are supported
Hello experienced mahout users,
I am new to mahout and I am trying to run naive bayes classification
example with 20news groups categories. I do not userstand one thing which I
am unable to spot. To train categorization I need a labeled data. I don't
see the way how the label of a particular
Hello Mahout experts,
I am trying to follow some examples provided with Mahout and some features
are not clear to me. It would be great if someone could clarify a bit more.
To prepare a the data (train and test) the following sequence of steps is
perfomed (taken from mahout cookbook):
All input
Hi Jakub,
The step that you are missing is `$mahout seqdir ...`. in this step each file
in each directory (where the directory is the Category) is converted into a
sequence file of form Text,Text where the Text key is /Category/doc_id.
`$mahout seq2sparse ...` vectorizes the output of
All input is merged into single dir:
*cp -R ${WORK_DIR}/20news-bydate*/*/* ${WORK_DIR}/20news-all*
as well the above line should read as follows.
$ cp -R ${WORK_DIR}/20news-bydate/*/* ${WORK_DIR}/20news-all
see: http://mahout.apache.org/users/classification/twenty-newsgroups.html
I am trying to use mahout's stochastic svd algorithm on a small dataset to
compare it with the regular svd algorithm (DistributedLanczos).I built the
covariance matrix for the dataset and I feed it to mahout svd which returns a
cleaneigenvectors file with the eigenvectors of this covariance
Hi Andrew,
thanks for your response which points me to the missing piece of the
puzzle! However there is still something which is not clear to me. Either
to me it seems that the sequence of the commands is not correct or I
haven't fully grasped the elementary mechanics here. I understand the
However the sequence of steps as described in Mahout Cookbook seems to me
incorrect as:
this is entirely possible, that book may be out of date. The end to end
instructions on the website for the 20 newsgroups example is up to date though.
As is the example script.
You don't want to
if memory serves me, DeLiClu (density-link) is current best density thing
since it does not require parameter searches.
What is parallelization strategy you are proposing?
I know there were a bunch of attempts to parallelize/partition the dbscan
problem, one of more interesting is perhaps of