Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty
Hi, I am trying to use lemmatization as a transformer and added belwo to the build.sbt "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0", "com.google.protobuf" % "protobuf-java" % "2.6.1", "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier "models", "org.scalatest"

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread Jacek Laskowski
Hi Jonardhan, Can you share the code that you execute? What's the command? Mind sharing the complete project on github? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskow

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty
Hi Jacek, Thanks for your response. This is the code I am trying to execute import org.apache.spark.sql.functions._ import com.databricks.spark.corenlp.functions._ val inputd = Seq( (1, "Stanford University is located in California. ") ).toDF("id", "text") val output = inputd.select(cleanxml(

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty
Using: spark-shell --packages databricks:spark-corenlp:0.2.0-s_2.11 On Sun, Sep 18, 2016 at 12:26 PM, janardhan shetty wrote: > Hi Jacek, > > Thanks for your response. This is the code I am trying to execute > > import org.apache.spark.sql.functions._ > import com.databricks.spark.corenlp.functi

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty
Also sometimes hitting this Error when spark-shell is used: Caused by: edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file) at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:770) at edu.stanford.nlp.tagger.maxe

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread Sujit Pal
Hi Janardhan, Maybe try removing the string "test" from this line in your build.sbt? IIRC, this restricts the models JAR to be called from a test. "edu.stanford.nlp" % "stanford-corenlp" % "3.6.0" % "test" classifier "models", -sujit On Sun, Sep 18, 2016 at 11:01 AM, janardhan shetty wrot

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-18 Thread janardhan shetty
Hi Sujit, Tried that option but same error: java version "1.8.0_51" libraryDependencies ++= { val sparkVersion = "2.0.0" Seq( "org.apache.spark" %% "spark-core" % sparkVersion % "provided", "org.apache.spark" %% "spark-sql" % sparkVersion % "provided", "org.apache.spark" %% "spa

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-19 Thread Jacek Laskowski
Hi Janardhan, What's the command to build the project (sbt package or sbt assembly)? What's the command you execute to run the application? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twi

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-19 Thread Sujit Pal
Hi Janardhan, You need the classifier "models" attribute on the second entry for stanford-corenlp to indicate that you want the models JAR, as shown below. Right now you are importing two instances of stanford-corenlp JARs. libraryDependencies ++= { val sparkVersion = "2.0.0" Seq( "org.ap

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-19 Thread janardhan shetty
Yes Sujit I have tried that option as well. Also tried sbt assembly but hitting below issue: http://stackoverflow.com/questions/35197120/java-outofmemoryerror-on-sbt- assembly Just wondering if there any clean approach to include StanfordCoreNLP classes in spark ML ? On Mon, Sep 19, 2016 at 1:4

Re: Lemmatization using StanfordNLP in ML 2.0

2016-09-24 Thread Timur Shenkao
Hello, everybody! May be it's not a reason of your problem, but I've noticed the line in your commentaries: *java version "1.8.0_51"* It's strongly advised to use Java 1.8.0_66+ I use even Java 1.8.0_101 On Tue, Sep 20, 2016 at 1:09 AM, janardhan shetty wrote: > Yes Sujit I have tried that op