Several questions about how pyspark.ml works

XapaJIaMnu Sun, 12 Jun 2016 03:08:57 -0700

Hey,

I have some additional Spark ML algorithms implemented in scala that I would
like to make available in pyspark. For a reference I am looking at the
available logistic regression implementation here:


https://spark.apache.org/docs/1.6.0/api/python/_modules/pyspark/ml/classification.html

I have couple of questions:
1) The constructor for the *class LogisticRegression* as far as I understand
just accepts the arguments and then just constructs the underlying Scala
object via /py4j/ and parses its arguments. This is done via the line
*self._java_obj = self._new_java_obj(
"org.apache.spark.ml.classification.LogisticRegression", self.uid)*
Is this correct?
What does the line *super(LogisticRegression, self).__init__()* do?

Does this mean that any python datastructures used with it will be converted
to java structures once the object is instantiated?

2) The corresponding model *class LogisticRegressionModel(JavaModel):* again
just instantiates the Java object and nothing else? Is just enough for me to
forward the arguments and instantiate the scala objects?
Does this mean that when the pipeline is created, even if the pipeline is
python it expects objects which are underlying scala code instantiated by
/py4j/. Can one use pure python elements inside the pipeline (dealing with
RDDs)? What would be the performance implication?

Cheers,

Nick



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Several-questions-about-how-pyspark-ml-works-tp27141.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Several questions about how pyspark.ml works

Reply via email to