Hi!

I was reading the ML with spark book, and I was very interested about the
9. chapter (text mining), so I tried code examples.

Everything was fine, but in this line:

val testLabels = testRDD.map {

case (file, text) => val topic = file.split("/").takeRight(2).head

newsgroupsMap(topic) }

I got an error: "value newsgroupsMap is not a member of String"

Other relevant part of the code:
val path = "/PATH/20news-bydate-train/*"
val rdd = sc.wholeTextFiles(path)
val newsgroups = rdd.map { case (file, text) =>
file.split("/").takeRight(2).head }

val tf = hashingTF.transform(tokens)
val idf = new IDF().fit(tf)
val tfidf = idf.transform(tf)

val newsgroupsMap = newsgroups.distinct.collect().zipWithIndex.toMap
val zipped = newsgroups.zip(tfidf)
val train = zipped.map { case (topic, vector)
=>LabeledPoint(newsgroupsMap(topic), vector) }
train.cache

val model = NaiveBayes.train(train, lambda = 0.1)

val testPath = "/PATH//20news-bydate-test/*"
val testRDD = sc.wholeTextFiles(testPath)
val testLabels = testRDD.map { case (file, text) => val topic =
file.split("/").takeRight(2).head newsgroupsMap(topic) }

I attached the whole program code.
Can anyone help, what the problem is?

Regards,
Zsombor

Attachment: scala-shell-code_09.scala
Description: Binary data

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to