Re: Spark ML Pipeline inaccessible types

Peter Rudenko Wed, 25 Mar 2015 06:29:20 -0700

Hi Martin, here’s 2 possibilities to overcome this:

1) Put your logic into org.apache.spark package in your project - theneverything would be accessible.

2) Dirty trick:

|object SparkVector extends HashingTF { val VectorUDT: DataType =outputDataType } |


then you can do like this:

|StructType("vectorTypeColumn", SparkVector.VectorUDT, false)) |

Thanks,
Peter Rudenko

On 2015-03-25 13:14, zapletal-mar...@email.cz wrote:

Sean,

thanks for your response. I am familiar with /NoSuchMethodException/in general, but I think it is not the case this time. The codeactually attempts to get parameter by name using /val m =this.getClass.getMethodName(paramName)./

This may be a bug, but it is only a side effect caused by the realproblem I am facing. My issue is that VectorUDT is not accessible byuser code and therefore it is not possible to use custom ML pipelinewith the existing Predictors (see the last two paragraphs in my firstemail).


Best Regards,
Martin

---------- Původní zpráva ----------
Od: Sean Owen <so...@cloudera.com>
Komu: zapletal-mar...@email.cz
Datum: 25. 3. 2015 11:05:54
Předmět: Re: Spark ML Pipeline inaccessible types


    NoSuchMethodError in general means that your runtime and compile-time
    environments are different. I think you need to first make sure you
    don't have mismatching versions of Spark.

    On Wed, Mar 25, 2015 at 11:00 AM, <zapletal-mar...@email.cz> wrote:
    > Hi,
    >
    > I have started implementing a machine learning pipeline using
    Spark 1.3.0
    > and the new pipelining API and DataFrames. I got to a point
    where I have my
    > training data set prepared using a sequence of Transformers, but
    I am
    > struggling to actually train a model and use it for predictions.
    >
    > I am getting a java.lang.NoSuchMethodException:
    >
    org.apache.spark.ml.regression.LinearRegression.myFeaturesColumnName()
    > exception thrown at checkInputColumn method in Params trait when
    using a
    > Predictor (LinearRegression in my case, but that should not
    matter). This
    > looks like a bug - the exception is thrown when executing
    getParam(colName)
    > when the require(actualDataType.equals(datatype), ...)
    requirement is not
    > met so the expected requirement failed exception is not thrown
    and is hidden
    > by the unexpected NoSuchMethodException instead. I can raise a
    bug if this
    > really is an issue and I am not using something incorrectly.
    >
    > The problem I am facing however is that the Predictor expects
    features to
    > have VectorUDT type as defined in Predictor class (protected def
    > featuresDataType: DataType = new VectorUDT). But since this type is
    > private[spark] my Transformer can not prepare features with this
    type which
    > then correctly results in the exception above when I use a
    different type.
    >
    > Is there a way to define a custom Pipeline that would be able to
    use the
    > existing Predictors without having to bypass the access modifiers or
    > reimplement something or is the pipelining API not yet expected
    to be used
    > in this way?
    >
    > Thanks,
    > Martin
    >
    >

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
    For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark ML Pipeline inaccessible types

Reply via email to