MLlib currently doesn't support CBOW - there is an open PR for it (see https://issues.apache.org/jira/browse/SPARK-20372).
On Thu, 28 Sep 2017 at 09:56 pun <punintended...@gmail.com> wrote: > Hello, > My understanding is that word2vec can be ran in two modes: > > - continuous bag-of-words (CBOW) (order of words does not matter) > - continuous skip-gram (order of words matters) > > I would like to run the *CBOW* implementation from Spark's MLlib, but it > is not clear to me from the documentation and their example how to do it. > This is the example listed on their page. From: > https://spark.apache.org/docs/2.1.0/mllib-feature-extraction.html#example > > import org.apache.spark.mllib.feature.{Word2Vec, Word2VecModel} > > val input = sc.textFile("data/mllib/sample_lda_data.txt").map(line => > line.split(" ").toSeq) > > val word2vec = new Word2Vec() > > val model = word2vec.fit(input) > > val synonyms = model.findSynonyms("1", 5) > > for((synonym, cosineSimilarity) <- synonyms) { > println(s"$synonym $cosineSimilarity") > } > > *My questions:* > > - Which of the two modes does this example use? > - Do you know how I can run the model in the CBOW mode? > > Thanks in advance! > ------------------------------ > Sent from the Apache Spark User List mailing list archive > <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. >