Estimating accuracy this way will almost always give you very poor results.
The reason is that unsupervised clustering will draw its own boundaries
which are very unlikely to match your own.
If you want to make this work you can do a few different things:
a) semi-supervised clustering. Include
Hi,
I am trying to get Latent Dirichlet Allocation to work,
I was following the instructions on this page
https://cwiki.apache.org/MAHOUT/latent-dirichlet-allocation.html
I have two questions
1) I want to make sure LDA is a different algorithm than the dirichlet
clustering algorithm? I am only a
Bojan,
Welcome to Mahout! Thanks for bringing your question to the mailing list.
Someone else with more technical experience will hopefully be able to
answer.
Best wishes,
Ellen
On Wed, Feb 6, 2013 at 4:50 PM, Bojan Kostić wrote:
> Hallo, my first post and i hope not last.
> For some time i pl
Hallo, my first post and i hope not last.
For some time i play with hadoop and mahout. Still learning... And i wish to
set up Mahout 0.8-SNAPSHOT and Hadoop 2.0.2 Alpha to work together.
I read on dev mailing list that Marty Kube build Mahout against Hadoop
2.0.2. Has anyone else tried? And if y
Hello there,
After some struggle, I managed to run cvb successfully. But I found that
dumping the output isn't much easier either. I tried to dump some keywords per
cluster by running the following command:
mahout vectordump -i [final_state_output_directory_used_in_cvb_run] -o
[output_file_pat
> The affect of downweighting the popular items is very similar to
> removing them from recommendations so I still suspect precision will
> go down using IDF. Obviously this can pretty easily be tested, I just
> wondered if anyone had already done it.
>
> This brings up a problem with holdout bas
Thanks for the advice. I tried out seq2encoded and that addressed my issue
of making the training set and test set use the same feature indices for
the same words. However, I'm a little disappointed there is no dictionary
file produced by seq2encoded. It would be nice to understand which word(s)
Note that the old clustering algorithms also run without Hadoop in
sequential execution mode from the local file system.
On 2/6/13 11:04 AM, Tanguy tlrx wrote:
Thanks!
-- Tanguy
2013/2/6 Ted Dunning
https://github.com/tdunning/knn/
especially the docs directory
On Wed, Feb 6, 2013 at 7:5
The affect of downweighting the popular items is very similar to removing them
from recommendations so I still suspect precision will go down using IDF.
Obviously this can pretty easily be tested, I just wondered if anyone had
already done it.
This brings up a problem with holdout based precisi
Hi,
I'm a complete novice of Mahout, and I'm currently learning how to use it by
examples.
I'm running the 20newsgroups clustering example and I'm wondering how to
visualize labels of classified document.
Do anyone know it?
Thanks,
Albert
This results in no information for universally preferred items, which
is indeed what I was looking for. It looks like this should also work
for other values or explicit preferences--item prices, ratings,
etc..
Intuition says this will result in a lower precision related cross
validation measu
oops, forgot the log
So...
idf weighted preference value = item preference value * log (number of all
items/number of users with specific item pref)
items
1 0 0
users 1 0 0
1 1 0
freq
Thanks!
-- Tanguy
2013/2/6 Ted Dunning
> https://github.com/tdunning/knn/
>
> especially the docs directory
>
> On Wed, Feb 6, 2013 at 7:54 AM, Tanguy tlrx wrote:
>
> > Hi Ted,
> >
> > Where can I find more details about these new algorithms?
> >
> > Thanks,
> >
> > -- Tanguy
> >
> > 2013/2/6
https://github.com/tdunning/knn/
especially the docs directory
On Wed, Feb 6, 2013 at 7:54 AM, Tanguy tlrx wrote:
> Hi Ted,
>
> Where can I find more details about these new algorithms?
>
> Thanks,
>
> -- Tanguy
>
> 2013/2/6 Ted Dunning
>
> > Yes they can.
> >
> > The new algorithms that are j
If you want relative error, you should model the log of the target
variable. This is very commonly done with prices.
My beefs with SVD methods in general are
a) they are often implemented without regularization
b) they are typically used to model ratings instead of the desired target
behavior
Hi Ted,
Where can I find more details about these new algorithms?
Thanks,
-- Tanguy
2013/2/6 Ted Dunning
> Yes they can.
>
> The new algorithms that are just now arriving are particularly suited to
> non-Hadoop use.
>
> On Wed, Feb 6, 2013 at 2:06 AM, vivek bairathi >wrote:
>
> > Hi,
> >
> >
Scaling values scales errors, yes. Yes you would have to normalize by a
range to meaningfully compare.
On Feb 6, 2013 2:58 PM, "Zia mel" wrote:
> In this case where there is different scaling or range, would MAE test
> be suitable and understandable ? For example, if we have range 1-5 and
> anoth
Yes they can.
The new algorithms that are just now arriving are particularly suited to
non-Hadoop use.
On Wed, Feb 6, 2013 at 2:06 AM, vivek bairathi wrote:
> Hi,
>
> I want to know that Mahout's clustering algorithms run without Hadoop or
> not?
> I mean can they be used without Hadoop?
>
>
> -
In this case where there is different scaling or range, would MAE test
be suitable and understandable ? For example, if we have range 1-5 and
another 1-20 , to make the same interpretation of MAE we need to
divide by the range ?
Many thanks
On Wed, Feb 6, 2013 at 4:13 AM, 万代豊 <20525entrad...@gmail
Sean
Thanks for your clarification.
I'll also keep in mind about what I need to be carefull with SVD.
Regards,,,
Yutaka
2013/2/6 Sean Owen
> Yes that would be valid in the sense that the neighborhood based approaches
> are outputting a weighted average of prices here which is also a price. You
>
Yes that would be valid in the sense that the neighborhood based approaches
are outputting a weighted average of prices here which is also a price. You
would have to think about which similarity metrics are meaningful though.
The SVD has a perhaps undesirable behavior here. Because it treats the
s
Hi
I also have a similar question regarding result interpretation based on how
we provide data to recommender.
Typcally , we provide rating data say in scale from 1-5 and get the result
in the same scale range.(and need to be consistent as Sean points out)
If we assume the provided data with other
22 matches
Mail list logo