Re: How to run MLlib's word2vec in CBOW mode?

Nick Pentreath Thu, 28 Sep 2017 08:01:53 -0700

MLlib currently doesn't support CBOW - there is an open PR for it (see
https://issues.apache.org/jira/browse/SPARK-20372).


On Thu, 28 Sep 2017 at 09:56 pun <punintended...@gmail.com> wrote:

> Hello,
> My understanding is that word2vec can be ran in two modes:
>
>    - continuous bag-of-words (CBOW) (order of words does not matter)
>    - continuous skip-gram (order of words matters)
>
> I would like to run the *CBOW* implementation from Spark's MLlib, but it
> is not clear to me from the documentation and their example how to do it.
> This is the example listed on their page. From:
> https://spark.apache.org/docs/2.1.0/mllib-feature-extraction.html#example
>
> import org.apache.spark.mllib.feature.{Word2Vec, Word2VecModel}
>
> val input = sc.textFile("data/mllib/sample_lda_data.txt").map(line => 
> line.split(" ").toSeq)
>
> val word2vec = new Word2Vec()
>
> val model = word2vec.fit(input)
>
> val synonyms = model.findSynonyms("1", 5)
>
> for((synonym, cosineSimilarity) <- synonyms) {
>   println(s"$synonym $cosineSimilarity")
> }
>
> *My questions:*
>
>    - Which of the two modes does this example use?
>    - Do you know how I can run the model in the CBOW mode?
>
> Thanks in advance!
> ------------------------------
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>

Re: How to run MLlib's word2vec in CBOW mode?

Reply via email to