Hi I am trying to do topic modeling in Spark using Spark's LDA package. Using Spark 2.0.2 and pyspark API.
I ran the code as below: *from pyspark.ml.clustering import LDA* *lda = LDA(featuresCol="tf_features",k=10, seed=1, optimizer="online")* *ldaModel=lda.fit(tf_df)* *lda_df=ldaModel.transform(tf_df)* I went through the docs to understand the output (the form of data) Spark generates for LDA. I understand the ldaModel.describeTopics() method. Gives topics with list of terms and weights. But I am not sure I understand the method ldamodel.topicsMatrix(). It gives me this: if the doc says it is the distribution of words for each topic (1184 words as rows, 10 topics as columns and the values of these cells. But then these values are not probabilities which is what one would expect for topic-word distribution. These have random values more than 1 (132.76, 3.00 and so on). Any jdea on this? Thanks ᐧ