Re: Spark ML Pipeline inaccessible types

Joseph Bradley Fri, 27 Mar 2015 14:48:13 -0700

Hi Martin,

In the short term: Would you be able to work with a different type other
than Vector?  If so, then you can override the *Predictor* class's "*protected
def featuresDataType: DataType"* with a DataFrame type which fits your
purpose.  If you need Vector, then you might have to do a hack like Peter
suggested.


In the long term: VectorUDT should indeed be made public, but that will
have to wait until the next release.

Thanks for the feedback,
Joseph

On Fri, Mar 27, 2015 at 11:12 AM, Xiangrui Meng <men...@gmail.com> wrote:

> Hi Martin,
>
> Could you attach the code snippet and the stack trace? The default
> implementation of some methods uses reflection, which may be the
> cause.
>
> Best,
> Xiangrui
>
> On Wed, Mar 25, 2015 at 3:18 PM,  <zapletal-mar...@email.cz> wrote:
> > Thanks Peter,
> >
> > I ended up doing something similar. I however consider both the
> approaches
> > you mentioned bad practices which is why I was looking for a solution
> > directly supported by the current code.
> >
> > I can work with that now, but it does not seem to be the proper solution.
> >
> > Regards,
> > Martin
> >
> > ---------- Původní zpráva ----------
> > Od: Peter Rudenko <petro.rude...@gmail.com>
> > Komu: zapletal-mar...@email.cz, Sean Owen <so...@cloudera.com>
> > Datum: 25. 3. 2015 13:28:38
> >
> >
> > Předmět: Re: Spark ML Pipeline inaccessible types
> >
> >
> > Hi Martin, here’s 2 possibilities to overcome this:
> >
> > 1) Put your logic into org.apache.spark package in your project - then
> > everything would be accessible.
> > 2) Dirty trick:
> >
> >  object SparkVector extends HashingTF {
> >   val VectorUDT: DataType = outputDataType
> > }
> >
> > then you can do like this:
> >
> >  StructType("vectorTypeColumn", SparkVector.VectorUDT, false))
> >
> > Thanks,
> > Peter Rudenko
> >
> > On 2015-03-25 13:14, zapletal-mar...@email.cz wrote:
> >
> > Sean,
> >
> > thanks for your response. I am familiar with NoSuchMethodException in
> > general, but I think it is not the case this time. The code actually
> > attempts to get parameter by name using val m =
> > this.getClass.getMethodName(paramName).
> >
> > This may be a bug, but it is only a side effect caused by the real
> problem I
> > am facing. My issue is that VectorUDT is not accessible by user code and
> > therefore it is not possible to use custom ML pipeline with the existing
> > Predictors (see the last two paragraphs in my first email).
> >
> > Best Regards,
> > Martin
> >
> > ---------- Původní zpráva ----------
> > Od: Sean Owen <so...@cloudera.com>
> > Komu: zapletal-mar...@email.cz
> > Datum: 25. 3. 2015 11:05:54
> > Předmět: Re: Spark ML Pipeline inaccessible types
> >
> >
> > NoSuchMethodError in general means that your runtime and compile-time
> > environments are different. I think you need to first make sure you
> > don't have mismatching versions of Spark.
> >
> > On Wed, Mar 25, 2015 at 11:00 AM, <zapletal-mar...@email.cz> wrote:
> >> Hi,
> >>
> >> I have started implementing a machine learning pipeline using Spark
> 1.3.0
> >> and the new pipelining API and DataFrames. I got to a point where I have
> >> my
> >> training data set prepared using a sequence of Transformers, but I am
> >> struggling to actually train a model and use it for predictions.
> >>
> >> I am getting a java.lang.NoSuchMethodException:
> >> org.apache.spark.ml.regression.LinearRegression.myFeaturesColumnName()
> >> exception thrown at checkInputColumn method in Params trait when using a
> >> Predictor (LinearRegression in my case, but that should not matter).
> This
> >> looks like a bug - the exception is thrown when executing
> >> getParam(colName)
> >> when the require(actualDataType.equals(datatype), ...) requirement is
> not
> >> met so the expected requirement failed exception is not thrown and is
> >> hidden
> >> by the unexpected NoSuchMethodException instead. I can raise a bug if
> this
> >> really is an issue and I am not using something incorrectly.
> >>
> >> The problem I am facing however is that the Predictor expects features
> to
> >> have VectorUDT type as defined in Predictor class (protected def
> >> featuresDataType: DataType = new VectorUDT). But since this type is
> >> private[spark] my Transformer can not prepare features with this type
> >> which
> >> then correctly results in the exception above when I use a different
> type.
> >>
> >> Is there a way to define a custom Pipeline that would be able to use the
> >> existing Predictors without having to bypass the access modifiers or
> >> reimplement something or is the pipelining API not yet expected to be
> used
> >> in this way?
> >>
> >> Thanks,
> >> Martin
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Spark ML Pipeline inaccessible types

Reply via email to