Re: Persisting PySpark ML Pipelines that include custom Transformers

2016-08-19 Thread Nicholas Chammas
My pipeline (i.e. a 2.0 Pipeline) is mostly made of the built-in transformers and estimators that come with Spark. One transformer, however, is custom (i.e. I subclassed Transformer), and all it does is use a UDF to append a VectorUDT column to a DataFrame. To speak in more concrete terms, my

Re: Persisting PySpark ML Pipelines that include custom Transformers

2016-08-19 Thread Holden Karau
I don't think we've given a lot of thought to model persistence for custom Python models yet - if the Python models is wrapping a JVM model using the JavaMLWritable along with '_to_java' should work provided your Java model alread is saveable. On the other hand - if your model isn't wrapping a

Persisting PySpark ML Pipelines that include custom Transformers

2016-08-19 Thread Nicholas Chammas
I understand persistence for PySpark ML pipelines is already present in 2.0, and further improvements are being made for 2.1 (e.g. SPARK-13786 ). I’m having trouble, though, persisting a pipeline that includes a custom Transformer (see