I am sorry, but I don't understand your questions or needs sufficiently to
answer.
On Wed, Apr 23, 2014 at 12:21 PM, Darshan Sonagara
darshan.sonag...@gmail.com wrote:
sir please reply me as soon as possible
thanks in advance.
On Tue, Apr 22, 2014 at 11:50 PM, Darshan Sonagara
Hi Darshan,
What i understand from your problem is that:
- You have clustered few documents
- You want to verify the accuracy of ur clustering , and you want to use
entropy for that
- You are not sure what should be the input for entropy calculation.
Possible solution:
The entropy would expect a
Yash,
I am not sure how your suggestion will work.
The problem is clustering algorithms tend to make hard assignments. Thus,
if you try to compute entropy relative to some reference probability
distribution (aka perplexity [1]) then a reference clustering will provide
1 or 0 as the probability.
Well I was not aware of perplexity calculation. Your point makes perfect
sense.
Entropies calculated independently for each cluster would not serve any
purpose.
So the question moves back to the questioner and I'd move back to textbooks
:)
Peace,
Yash
On Sat, May 24, 2014 at 12:01 AM, Ted
I am Final year BE Student from Gujarat,India. right now studying in
Information Technology Branch. i have Final Year project as Document
Clustering using Hadoop.
At this stage i am able to find final result from cluster dump command in
which i can see number of document in particular cluster and
On Tue, Apr 22, 2014 at 12:11 AM, Darshan Sonagara
darshan.sonag...@gmail.com wrote:
But the problem is that i want check that whether my clustering is good or
bad. so for that i need to calculate Entropy Value. I am not having any
idea how to calculate entropy in mahout or by other
Thnks for the Replay sir,
actually i am doing clustering for gathering similar king of document in
same cluster as much as possible.
i can see from output file by cluster dump by observing top term.
i also figure out that by varying Distance Measure Technique. it differs.
but i want some
waiting for the replay sir .
On Tue, Apr 22, 2014 at 7:13 PM, Darshan Sonagara
darshan.sonag...@gmail.com wrote:
Thnks for the Replay sir,
actually i am doing clustering for gathering similar king of document in
same cluster as much as possible.
i can see from output file by cluster dump