Hi! I was reading the ML with spark book, and I was very interested about the 9. chapter (text mining), so I tried code examples.
Everything was fine, but in this line: val testLabels = testRDD.map { case (file, text) => val topic = file.split("/").takeRight(2).head newsgroupsMap(topic) } I got an error: "value newsgroupsMap is not a member of String" Other relevant part of the code: val path = "/PATH/20news-bydate-train/*" val rdd = sc.wholeTextFiles(path) val newsgroups = rdd.map { case (file, text) => file.split("/").takeRight(2).head } val tf = hashingTF.transform(tokens) val idf = new IDF().fit(tf) val tfidf = idf.transform(tf) val newsgroupsMap = newsgroups.distinct.collect().zipWithIndex.toMap val zipped = newsgroups.zip(tfidf) val train = zipped.map { case (topic, vector) =>LabeledPoint(newsgroupsMap(topic), vector) } train.cache val model = NaiveBayes.train(train, lambda = 0.1) val testPath = "/PATH//20news-bydate-test/*" val testRDD = sc.wholeTextFiles(testPath) val testLabels = testRDD.map { case (file, text) => val topic = file.split("/").takeRight(2).head newsgroupsMap(topic) } I attached the whole program code. Can anyone help, what the problem is? Regards, Zsombor
scala-shell-code_09.scala
Description: Binary data
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org