Hi Tomasz,

The limitation will not be changed and you will found all the models
reference to SparkContext in the new Spark ML package. It make the Python
API simple for implementation.

But it does not means you can only call this function on local data, you
can operate this function on an RDD like the following code snippet:

gmmModel.predictSoft(rdd)

then you will get a new RDD which is the soft prediction result. And all
the models in ML package follow this rule.

Yanbo

2016-01-04 22:16 GMT+08:00 Tomasz Fruboes <tomasz.frub...@ncbj.gov.pl>:

> Hi Yanbo,
>
>  thanks for info. Is it likely to change in (near :) ) future? Ability to
> call this function only on local data (ie not in rdd) seems to be rather
> serious limitation.
>
>  cheers,
>   Tomasz
>
> On 02.01.2016 09:45, Yanbo Liang wrote:
>
>> Hi Tomasz,
>>
>> The GMM is bind with the peer Java GMM object, so it need reference to
>> SparkContext.
>> Some of MLlib(not ML) models are simple object such as KMeansModel,
>> LinearRegressionModel etc., but others will refer SparkContext. The
>> later ones and corresponding member functions should not called in map().
>>
>> Cheers
>> Yanbo
>>
>>
>>
>> 2016-01-01 4:12 GMT+08:00 Tomasz Fruboes <tomasz.frub...@ncbj.gov.pl
>> <mailto:tomasz.frub...@ncbj.gov.pl>>:
>>
>>     Dear All,
>>
>>       I'm trying to implement a procedure that iteratively updates a rdd
>>     using results from GaussianMixtureModel.predictSoft. In order to
>>     avoid problems with local variable (the obtained GMM) beeing
>>     overwritten in each pass of the loop I'm doing the following:
>>
>>     #######################################################
>>     for i in xrange(10):
>>          gmm = GaussianMixture.train(rdd, 2)
>>
>>          def getSafePredictor(unsafeGMM):
>>              return lambda x: \
>>                  (unsafeGMM.predictSoft(x.features),
>>     unsafeGMM.gaussians.mu <http://unsafeGMM.gaussians.mu>)
>>
>>
>>          safePredictor = getSafePredictor(gmm)
>>          predictionsRDD = (labelledpointrddselectedfeatsNansPatched
>>                .map(safePredictor)
>>          )
>>          print predictionsRDD.take(1)
>>          (... - rest of code - update rdd with results from
>> predictionsRdd)
>>     #######################################################
>>
>>     Unfortunately this ends with:
>>
>>     #######################################################
>>     Exception: It appears that you are attempting to reference
>>     SparkContext from a broadcast variable, action, or transformation.
>>     SparkContext can only be used on the driver, not in code that it run
>>     on workers. For more information, see SPARK-5063.
>>     #######################################################
>>
>>     Any idea why I'm getting this behaviour? My expectation would be,
>>     that GMM should be a "simple" object without SparkContext in it.
>>     I'm using spark 1.5.2
>>
>>       Thanks,
>>         Tomasz
>>
>>
>>     ps As a workaround I'm doing currently
>>
>>     ########################
>>          def getSafeGMM(unsafeGMM):
>>              return lambda x: unsafeGMM.predictSoft(x)
>>
>>          safeGMM = getSafeGMM(gmm)
>>          predictionsRDD = \
>>              safeGMM(labelledpointrddselectedfeatsNansPatched.map(rdd))
>>     ########################
>>       which works fine. If it's possible I would like to avoid this
>>     approach, since it would require to perform another closure on
>>     gmm.gaussians later in my code
>>
>>
>>     ---------------------------------------------------------------------
>>     To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>     <mailto:user-unsubscr...@spark.apache.org>
>>     For additional commands, e-mail: user-h...@spark.apache.org
>>     <mailto:user-h...@spark.apache.org>
>>
>>
>>
>

Reply via email to