Yes Sujit I have tried that option as well.
Also tried sbt assembly but hitting below issue:

http://stackoverflow.com/questions/35197120/java-outofmemoryerror-on-sbt-
assembly

Just wondering if there any clean approach to include StanfordCoreNLP
classes in spark ML ?


On Mon, Sep 19, 2016 at 1:41 PM, Sujit Pal <sujitatgt...@gmail.com> wrote:

> Hi Janardhan,
>
> You need the classifier "models" attribute on the second entry for
> stanford-corenlp to indicate that you want the models JAR, as shown below.
> Right now you are importing two instances of stanford-corenlp JARs.
>
> libraryDependencies ++= {
>   val sparkVersion = "2.0.0"
>   Seq(
>     "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
>     "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
>     "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
>     "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
>     "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
>     "com.google.protobuf" % "protobuf-java" % "2.6.1",
>     "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" classifier "models",
>     "org.scalatest" %% "scalatest" % "2.2.6" % "test"
>   )
> }
>
> -sujit
>
>
> On Sun, Sep 18, 2016 at 5:12 PM, janardhan shetty <janardhan...@gmail.com>
> wrote:
>
>> Hi Sujit,
>>
>> Tried that option but same error:
>>
>> java version "1.8.0_51"
>>
>>
>> libraryDependencies ++= {
>>   val sparkVersion = "2.0.0"
>>   Seq(
>>     "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
>>     "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
>>     "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
>>     "org.apache.spark" %% "spark-mllib" % sparkVersion % "provided",
>>     "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
>>     "com.google.protobuf" % "protobuf-java" % "2.6.1",
>>     "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
>>     "org.scalatest" %% "scalatest" % "2.2.6" % "test"
>>   )
>> }
>>
>> Error:
>>
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> edu/stanford/nlp/pipeline/StanfordCoreNLP
>>     at transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.ap
>> ply(Lemmatizer.scala:37)
>>     at transformers.ml.Lemmatizer$$anonfun$createTransformFunc$1.ap
>> ply(Lemmatizer.scala:33)
>>     at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$
>> 2.apply(ScalaUDF.scala:88)
>>     at org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$
>> 2.apply(ScalaUDF.scala:87)
>>     at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(Scal
>> aUDF.scala:1060)
>>     at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedEx
>> pressions.scala:142)
>>     at org.apache.spark.sql.catalyst.expressions.InterpretedProject
>> ion.apply(Projection.scala:45)
>>     at org.apache.spark.sql.catalyst.expressions.InterpretedProject
>> ion.apply(Projection.scala:29)
>>     at scala.collection.TraversableLike$$anonfun$map$1.apply(
>> TraversableLike.scala:234)
>>     at scala.collection.TraversableLike$$anonfun$map$1.apply(
>> TraversableLike.scala:234)
>>     at scala.collection.immutable.List.foreach(List.scala:381)
>>     at scala.collection.TraversableLike$class.map(TraversableLike.
>> scala:234)
>>
>>
>>
>> On Sun, Sep 18, 2016 at 2:21 PM, Sujit Pal <sujitatgt...@gmail.com>
>> wrote:
>>
>>> Hi Janardhan,
>>>
>>> Maybe try removing the string "test" from this line in your build.sbt?
>>> IIRC, this restricts the models JAR to be called from a test.
>>>
>>>     "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test"
>>> classifier "models",
>>>
>>> -sujit
>>>
>>>
>>> On Sun, Sep 18, 2016 at 11:01 AM, janardhan shetty <
>>> janardhan...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to use lemmatization as a transformer and added belwo to
>>>> the build.sbt
>>>>
>>>>  "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0",
>>>>     "com.google.protobuf" % "protobuf-java" % "2.6.1",
>>>>     "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test"
>>>> classifier "models",
>>>>     "org.scalatest" %% "scalatest" % "2.2.6" % "test"
>>>>
>>>>
>>>> Error:
>>>> *Exception in thread "main" java.lang.NoClassDefFoundError:
>>>> edu/stanford/nlp/pipeline/StanfordCoreNLP*
>>>>
>>>> I have tried other versions of this spark package.
>>>>
>>>> Any help is appreciated..
>>>>
>>>
>>>
>>
>

Reply via email to