Heszak, I have only glanced at it but you should be able to incorporate tokens approximating n-gram yourself, say by using the lucene ShingleAnalyzerWrapper API http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleAnalyzerWrapper.html You might also take a glance at http://www.mimno.org/articles/phrases/ C
On Wed, Mar 18, 2015 at 5:37 PM, heszak <hzakerza...@collabware.com> wrote: > I wonder to know whether the newly-released LDA (Latent Dirichlet > Allocation) > algorithm only supports uni-gram or it can also supports bi/tri-grams too? > If it can, can someone help me how I can use them? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Does-newly-released-LDA-Latent-Dirichlet-Allocation-algorithm-supports-ngrams-tp22131.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- - Charles