Hi Nick, Please see my inline reply.
Thanks Yanbo 2016-06-12 3:08 GMT-07:00 XapaJIaMnu <nhe...@gmail.com>: > Hey, > > I have some additional Spark ML algorithms implemented in scala that I > would > like to make available in pyspark. For a reference I am looking at the > available logistic regression implementation here: > > > https://spark.apache.org/docs/1.6.0/api/python/_modules/pyspark/ml/classification.html > > I have couple of questions: > 1) The constructor for the *class LogisticRegression* as far as I > understand > just accepts the arguments and then just constructs the underlying Scala > object via /py4j/ and parses its arguments. This is done via the line > *self._java_obj = self._new_java_obj( > "org.apache.spark.ml.classification.LogisticRegression", self.uid)* > Is this correct? > What does the line *super(LogisticRegression, self).__init__()* do? > *super(LogisticRegression, self).__init__()* is used to initialize the *Params* object at Python side, since we store all params at Python side and transfer them to Scala side when calling *fit*. > > Does this mean that any python datastructures used with it will be > converted > to java structures once the object is instantiated? > > 2) The corresponding model *class LogisticRegressionModel(JavaModel):* > again > just instantiates the Java object and nothing else? Is just enough for me > to > forward the arguments and instantiate the scala objects? > Does this mean that when the pipeline is created, even if the pipeline is > python it expects objects which are underlying scala code instantiated by > /py4j/. Can one use pure python elements inside the pipeline (dealing with > RDDs)? What would be the performance implication? > *class LogisticRegressionModel(JavaModel)* is only a wrapper of the peer Scala model object. > > Cheers, > > Nick > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Several-questions-about-how-pyspark-ml-works-tp27141.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >