Dear All,
I'm trying to implement a procedure that iteratively updates a rdd
using results from GaussianMixtureModel.predictSoft. In order to avoid
problems with local variable (the obtained GMM) beeing overwritten in
each pass of the loop I'm doing the following:
#######################################################
for i in xrange(10):
gmm = GaussianMixture.train(rdd, 2)
def getSafePredictor(unsafeGMM):
return lambda x: \
(unsafeGMM.predictSoft(x.features), unsafeGMM.gaussians.mu)
safePredictor = getSafePredictor(gmm)
predictionsRDD = (labelledpointrddselectedfeatsNansPatched
.map(safePredictor)
)
print predictionsRDD.take(1)
(... - rest of code - update rdd with results from predictionsRdd)
#######################################################
Unfortunately this ends with:
#######################################################
Exception: It appears that you are attempting to reference SparkContext
from a broadcast variable, action, or transformation. SparkContext can
only be used on the driver, not in code that it run on workers. For more
information, see SPARK-5063.
#######################################################
Any idea why I'm getting this behaviour? My expectation would be, that
GMM should be a "simple" object without SparkContext in it. I'm using
spark 1.5.2
Thanks,
Tomasz
ps As a workaround I'm doing currently
########################
def getSafeGMM(unsafeGMM):
return lambda x: unsafeGMM.predictSoft(x)
safeGMM = getSafeGMM(gmm)
predictionsRDD = \
safeGMM(labelledpointrddselectedfeatsNansPatched.map(rdd))
########################
which works fine. If it's possible I would like to avoid this
approach, since it would require to perform another closure on
gmm.gaussians later in my code
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org