Hi,

This is a little off topic, but perhaps someone on this list may be able to
comment.

I'm still fairly new to LDA, and I've been playing with Yahoo's LDA
implementation.

The Yahoo code produces a file called:

lda.worToTop.txt

www.teddybears.com/     recreation/toys (teddy,15) (bears,15) (enjoy,2)
(teddy,15) (bears,15) (enjoy,2) (featuring,41) (teddy,15)
www.bearsbythesea.com/  recreation/toys (teddy,99) (bear,99) (store,81)
(pismo,30) (beach,88) (california,24) (specialize,99) (muffy,99) (store,11)
(complete,11) (collections,46) (checkout,84) (web,87) 

So this shows that teddy is in topic 15 adn in topic 99.

However, what I thought I would be looking for, is a vector, whereby each
word is defined as a set of probabilities into a particular topic.  (eg,
with 600 topics I could have a vector that maps that word into each of those
600 topics)

This vector could then be used for calculating similarity against other
words, etc.  Is the correct idea?

If so, using the Yahoo LDA output, for each unique word, I have to calculate
that vector and probability myself, using the above file?  Perhaps I'm
missing something?

Thanks, Ian

Reply via email to