The C implementation of Word2Vec updates the model using multi-threads
without locking. It is hard to implement it in a distributed way. In
the MLlib implementation, each work holds the entire model in memory
and output the part of model that gets updated. The driver still need
to collect and
I was wondering if there was any chance of getting a more distributed word2vec
implementation. I seem to be running out of memory from big local data
structures such as
val syn1Global = new Array[Float](vocabSize * vectorSize)
Is there anyway chance of getting a version where these are all