Hi guys,

I ran CVB on a set of very small text documents (so small I don't expect it
to return good results, frequently having 1-2 terms--it was too slow on my
larger documents so I just wanted to see if I could get a run to work; the
same dataset has reasonable results with kmeans and canopy/kmeans, so there
are associations to capture). I dumped the vectors at the end and they seem
to all have the same terms: the top 40 out of a dictionary of about 10500.
I used default smoothing parameters and asked for 40 topics (why is that
the same number of features in the vectors that are output?)

Does anyone know what caused this problem? Output not being what I think it
is (top terms and weights for each topic), parameters too small/too large,
too many features in the vector, too-small vectors, or some combination?

Reply via email to