Hi Tomasz,

The GMM is bind with the peer Java GMM object, so it need reference to
Some of MLlib(not ML) models are simple object such as KMeansModel,
LinearRegressionModel etc., but others will refer SparkContext. The later
ones and corresponding member functions should not called in map().


2016-01-01 4:12 GMT+08:00 Tomasz Fruboes <tomasz.frub...@ncbj.gov.pl>:

> Dear All,
>  I'm trying to implement a procedure that iteratively updates a rdd using
> results from GaussianMixtureModel.predictSoft. In order to avoid problems
> with local variable (the obtained GMM) beeing overwritten in each pass of
> the loop I'm doing the following:
> #######################################################
> for i in xrange(10):
>     gmm = GaussianMixture.train(rdd, 2)
>     def getSafePredictor(unsafeGMM):
>         return lambda x: \
>             (unsafeGMM.predictSoft(x.features), unsafeGMM.gaussians.mu)
>     safePredictor = getSafePredictor(gmm)
>     predictionsRDD = (labelledpointrddselectedfeatsNansPatched
>           .map(safePredictor)
>     )
>     print predictionsRDD.take(1)
>     (... - rest of code - update rdd with results from predictionsRdd)
> #######################################################
> Unfortunately this ends with:
> #######################################################
> Exception: It appears that you are attempting to reference SparkContext
> from a broadcast variable, action, or transformation. SparkContext can only
> be used on the driver, not in code that it run on workers. For more
> information, see SPARK-5063.
> #######################################################
> Any idea why I'm getting this behaviour? My expectation would be, that GMM
> should be a "simple" object without SparkContext in it.  I'm using spark
> 1.5.2
>  Thanks,
>    Tomasz
> ps As a workaround I'm doing currently
> ########################
>     def getSafeGMM(unsafeGMM):
>         return lambda x: unsafeGMM.predictSoft(x)
>     safeGMM = getSafeGMM(gmm)
>     predictionsRDD = \
>         safeGMM(labelledpointrddselectedfeatsNansPatched.map(rdd))
> ########################
>  which works fine. If it's possible I would like to avoid this approach,
> since it would require to perform another closure on gmm.gaussians later in
> my code
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to