Hi Martin, In the short term: Would you be able to work with a different type other than Vector? If so, then you can override the *Predictor* class's "*protected def featuresDataType: DataType"* with a DataFrame type which fits your purpose. If you need Vector, then you might have to do a hack like Peter suggested.
In the long term: VectorUDT should indeed be made public, but that will have to wait until the next release. Thanks for the feedback, Joseph On Fri, Mar 27, 2015 at 11:12 AM, Xiangrui Meng <men...@gmail.com> wrote: > Hi Martin, > > Could you attach the code snippet and the stack trace? The default > implementation of some methods uses reflection, which may be the > cause. > > Best, > Xiangrui > > On Wed, Mar 25, 2015 at 3:18 PM, <zapletal-mar...@email.cz> wrote: > > Thanks Peter, > > > > I ended up doing something similar. I however consider both the > approaches > > you mentioned bad practices which is why I was looking for a solution > > directly supported by the current code. > > > > I can work with that now, but it does not seem to be the proper solution. > > > > Regards, > > Martin > > > > ---------- Původní zpráva ---------- > > Od: Peter Rudenko <petro.rude...@gmail.com> > > Komu: zapletal-mar...@email.cz, Sean Owen <so...@cloudera.com> > > Datum: 25. 3. 2015 13:28:38 > > > > > > Předmět: Re: Spark ML Pipeline inaccessible types > > > > > > Hi Martin, here’s 2 possibilities to overcome this: > > > > 1) Put your logic into org.apache.spark package in your project - then > > everything would be accessible. > > 2) Dirty trick: > > > > object SparkVector extends HashingTF { > > val VectorUDT: DataType = outputDataType > > } > > > > then you can do like this: > > > > StructType("vectorTypeColumn", SparkVector.VectorUDT, false)) > > > > Thanks, > > Peter Rudenko > > > > On 2015-03-25 13:14, zapletal-mar...@email.cz wrote: > > > > Sean, > > > > thanks for your response. I am familiar with NoSuchMethodException in > > general, but I think it is not the case this time. The code actually > > attempts to get parameter by name using val m = > > this.getClass.getMethodName(paramName). > > > > This may be a bug, but it is only a side effect caused by the real > problem I > > am facing. My issue is that VectorUDT is not accessible by user code and > > therefore it is not possible to use custom ML pipeline with the existing > > Predictors (see the last two paragraphs in my first email). > > > > Best Regards, > > Martin > > > > ---------- Původní zpráva ---------- > > Od: Sean Owen <so...@cloudera.com> > > Komu: zapletal-mar...@email.cz > > Datum: 25. 3. 2015 11:05:54 > > Předmět: Re: Spark ML Pipeline inaccessible types > > > > > > NoSuchMethodError in general means that your runtime and compile-time > > environments are different. I think you need to first make sure you > > don't have mismatching versions of Spark. > > > > On Wed, Mar 25, 2015 at 11:00 AM, <zapletal-mar...@email.cz> wrote: > >> Hi, > >> > >> I have started implementing a machine learning pipeline using Spark > 1.3.0 > >> and the new pipelining API and DataFrames. I got to a point where I have > >> my > >> training data set prepared using a sequence of Transformers, but I am > >> struggling to actually train a model and use it for predictions. > >> > >> I am getting a java.lang.NoSuchMethodException: > >> org.apache.spark.ml.regression.LinearRegression.myFeaturesColumnName() > >> exception thrown at checkInputColumn method in Params trait when using a > >> Predictor (LinearRegression in my case, but that should not matter). > This > >> looks like a bug - the exception is thrown when executing > >> getParam(colName) > >> when the require(actualDataType.equals(datatype), ...) requirement is > not > >> met so the expected requirement failed exception is not thrown and is > >> hidden > >> by the unexpected NoSuchMethodException instead. I can raise a bug if > this > >> really is an issue and I am not using something incorrectly. > >> > >> The problem I am facing however is that the Predictor expects features > to > >> have VectorUDT type as defined in Predictor class (protected def > >> featuresDataType: DataType = new VectorUDT). But since this type is > >> private[spark] my Transformer can not prepare features with this type > >> which > >> then correctly results in the exception above when I use a different > type. > >> > >> Is there a way to define a custom Pipeline that would be able to use the > >> existing Predictors without having to bypass the access modifiers or > >> reimplement something or is the pipelining API not yet expected to be > used > >> in this way? > >> > >> Thanks, > >> Martin > >> > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >