Pardon my ignorance, then! It sounds like canopy+kmeans is the clear
choice for what we're doing over here, then. :)

Given that, I'm curious: what applications people have found for LDA?
It seems to me that there are two things you get from LDA: the topics
and their associated word lists and 'concentrated' vectors. It looks
like the biggest benefit of LDA actually the "basis concentration" or
"basis isolation" which should then allow for better clustering via
canopy or kmeans. I can't really think of an instance where topic
creation in and of itself would be a useful endeavor if there is no
way to associate a documents with topics.

On Wed, Apr 27, 2011 at 3:03 AM, Vasil Vasilev <[email protected]> wrote:
> Hi guys,
>
> I am about to checkin my LDA vectorizer (as discussed in the mail thread LDA
> related enhancements). I just need some time to polish the code and write
> tests.
>
> Regards, Vasil
>
> On Wed, Apr 27, 2011 at 6:43 AM, Lance Norskog <[email protected]> wrote:
>
>> What is a good name for what LDA and SVD do? "Basis concentration"?
>> "Basis isolation"?
>>
>> On 4/26/11, Ted Dunning <[email protected]> wrote:
>> > I think you are right.
>> >
>> > On Tue, Apr 26, 2011 at 2:32 PM, Jake Mannix <[email protected]>
>> wrote:
>> >
>> >>
>> >> Ted, I think what they are asking is for the output of the gamma matrix
>> >> (i.e.
>> >> the LDA version of the *left* singular vectors, living in
>> >> document-by-topic-space,
>> >> not topic-by-word space), which is currently not produced (not even on
>> >> trunk, iirc).
>> >>
>> >>  -jake
>> >>
>> >
>>
>>
>> --
>> Lance Norskog
>> [email protected]
>>
>

Reply via email to