Re: TF-IDF from spark-1.1.0 not working on cluster mode

2015-01-09 Thread Xiangrui Meng
This is worker log, not executor log. The executor log can be found in folders like /newdisk2/rta/rtauser/workerdir/app-20150109182514-0001/0/ . -Xiangrui On Fri, Jan 9, 2015 at 5:03 AM, Priya Ch learnings.chitt...@gmail.com wrote: Please find the attached worker log. I could see stream closed

Re: TF-IDF from spark-1.1.0 not working on cluster mode

2015-01-09 Thread Priya Ch
Please find the attached worker log. I could see stream closed exception On Wed, Jan 7, 2015 at 10:51 AM, Xiangrui Meng men...@gmail.com wrote: Could you attach the executor log? That may help identify the root cause. -Xiangrui On Mon, Jan 5, 2015 at 11:12 PM, Priya Ch

Re: TF-IDF from spark-1.1.0 not working on cluster mode

2015-01-06 Thread Xiangrui Meng
Could you attach the executor log? That may help identify the root cause. -Xiangrui On Mon, Jan 5, 2015 at 11:12 PM, Priya Ch learnings.chitt...@gmail.com wrote: Hi All, Word2Vec and TF-IDF algorithms in spark mllib-1.1.0 are working only in local mode and not on distributed mode. Null

TF-IDF from spark-1.1.0 not working on cluster mode

2015-01-05 Thread Priya Ch
Hi All, Word2Vec and TF-IDF algorithms in spark mllib-1.1.0 are working only in local mode and not on distributed mode. Null pointer exception has been thrown. Is this a bug in spark-1.1.0 ? *Following is the code:* def main(args:Array[String]) { val conf=new SparkConf val sc=new

Re: TF-IDF in Spark 1.1.0

2014-12-28 Thread Yao
Can you show how to do IDF transform on tfWithId? Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/TF-IDF-in-Spark-1-1-0-tp16389p20877.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: TF-IDF in Spark 1.1.0

2014-10-16 Thread Burke Webster
Thanks for the response. Appreciate the help! Burke On Tue, Oct 14, 2014 at 3:00 PM, Xiangrui Meng men...@gmail.com wrote: You cannot recover the document from the TF-IDF vector, because HashingTF is not reversible. You can assign each document a unique ID, and join back the result after

TF-IDF in Spark 1.1.0

2014-10-14 Thread Burke Webster
I'm following the Mllib example for TF-IDF and ran into a problem due to my lack of knowledge of Scala and spark. Any help would be greatly appreciated. Following the Mllib example I could do something like this: import org.apache.spark.rdd.RDD import org.apache.spark.SparkContext import

Re: TF-IDF in Spark 1.1.0

2014-10-14 Thread Xiangrui Meng
You cannot recover the document from the TF-IDF vector, because HashingTF is not reversible. You can assign each document a unique ID, and join back the result after training. HasingTF can transform individual record: val docs: RDD[(String, Seq[String])] = ... val tf = new HashingTF() val