I don't see anything wrong with your code. Could you do points.count() to see how many training examples you have? Also, make sure you don't have negative feature values. The error message you sent did not say NaiveBayes went wrong, but the Spark shell was killed. -Xiangrui
On Thu, Apr 24, 2014 at 4:05 PM, John King <usedforprinting...@gmail.com> wrote: > In the other thread I had an issue with Python. In this issue, I tried > switching to Scala. The code is: > > import org.apache.spark.mllib.regression.LabeledPoint; > > import org.apache.spark.mllib.linalg.SparseVector; > > import org.apache.spark.mllib.classification.NaiveBayes; > > import scala.collection.mutable.ArrayBuffer > > > > def isEmpty(a: String): Boolean = a != null && !a.replaceAll("""(?m)\s+$""", > "").isEmpty() > > def parsePoint(a: String): LabeledPoint = { > > val values = a.split('\t') > > val feat = values(1).split(' ') > > val indices = ArrayBuffer.empty[Int] > > val featValues = ArrayBuffer.empty[Double] > > for (f <- feat) { > > val q = f.split(':') > > if (q.length == 2) { > > indices += (q(0).toInt) > > featValues += (q(1).toDouble) > > } > > } > > val vector = new SparseVector(2357815, indices.toArray, > featValues.toArray) > > return LabeledPoint(values(0).toDouble, vector) > > } > > > val data = sc.textFile("data.txt") > > val empty = data.filter(isEmpty) > > val points = empty.map(parsePoint) > > points.cache() > > val model = new NaiveBayes().run(points) > > > > On Thu, Apr 24, 2014 at 6:57 PM, Xiangrui Meng <men...@gmail.com> wrote: >> >> Do you mind sharing more code and error messages? The information you >> provided is too little to identify the problem. -Xiangrui >> >> On Thu, Apr 24, 2014 at 1:55 PM, John King <usedforprinting...@gmail.com> >> wrote: >> > Last command was: >> > >> > val model = new NaiveBayes().run(points) >> > >> > >> > >> > On Thu, Apr 24, 2014 at 4:27 PM, Xiangrui Meng <men...@gmail.com> wrote: >> >> >> >> Could you share the command you used and more of the error message? >> >> Also, is it an MLlib specific problem? -Xiangrui >> >> >> >> On Thu, Apr 24, 2014 at 11:49 AM, John King >> >> <usedforprinting...@gmail.com> wrote: >> >> > ./spark-shell: line 153: 17654 Killed >> >> > $FWDIR/bin/spark-class org.apache.spark.repl.Main "$@" >> >> > >> >> > >> >> > Any ideas? >> > >> > > >