Re: TF-IDF in Spark 1.1.0

2014-12-28 Thread Yao
Can you show how to do IDF transform on tfWithId? Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/TF-IDF-in-Spark-1-1-0-tp16389p20877.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -

Re: TF-IDF in Spark 1.1.0

2014-10-16 Thread Burke Webster
Thanks for the response. Appreciate the help! Burke On Tue, Oct 14, 2014 at 3:00 PM, Xiangrui Meng wrote: > You cannot recover the document from the TF-IDF vector, because > HashingTF is not reversible. You can assign each document a unique ID, > and join back the result after training. Hasing

Re: TF-IDF in Spark 1.1.0

2014-10-14 Thread Xiangrui Meng
You cannot recover the document from the TF-IDF vector, because HashingTF is not reversible. You can assign each document a unique ID, and join back the result after training. HasingTF can transform individual record: val docs: RDD[(String, Seq[String])] = ... val tf = new HashingTF() val tfWithI

TF-IDF in Spark 1.1.0

2014-10-14 Thread Burke Webster
I'm following the Mllib example for TF-IDF and ran into a problem due to my lack of knowledge of Scala and spark. Any help would be greatly appreciated. Following the Mllib example I could do something like this: import org.apache.spark.rdd.RDD import org.apache.spark.SparkContext import org.apa