I modified $MAHOUT_HOME/utils/src/main/java/org/apache/mahout/clustering/lda/LDAPrintTopics.java so the score is printed along each word., but the interpretation of the scores is somewhat obscure. I see values in the range of -8 to +6. I assumed the values should represent P(word | topic) or log(P(word | topic)) but these values are of different range. How should I interpret these values? Is there a simple way to retrieve P (word | topic)?
Thanks,
Avishay.
From: Avishay Livne1/Haifa/i...@ibmil
To: [email protected]
Date: 06/06/2010 03:16 PM
Subject: extract p(doc|topic) from LDA
Hi,
I'm trying to use LDA for a collaborative filtering task, where I need to
predict the rating a user (document) will give to a movie (word).
I ran LDA and constructed T topics, but I can only print the most frequent
words (movies) per topic.
Is it possible to extract p(documet|topic) or p(word|topic) from LDA's
output? (document = new user, word = movie).
Best regards,
Avishay
