I only see one risk: if your feature indices are not sorted, it might have undefined behavior. Other than that, I don't see any thing suspicious. -Xiangrui
On Thu, Apr 24, 2014 at 4:56 PM, John King <usedforprinting...@gmail.com> wrote: > It just displayed this error and stopped on its own. Do the lines of code > mentioned in the error have anything to do with it? > > > On Thu, Apr 24, 2014 at 7:54 PM, Xiangrui Meng <men...@gmail.com> wrote: >> >> I don't see anything wrong with your code. Could you do points.count() >> to see how many training examples you have? Also, make sure you don't >> have negative feature values. The error message you sent did not say >> NaiveBayes went wrong, but the Spark shell was killed. -Xiangrui >> >> On Thu, Apr 24, 2014 at 4:05 PM, John King <usedforprinting...@gmail.com> >> wrote: >> > In the other thread I had an issue with Python. In this issue, I tried >> > switching to Scala. The code is: >> > >> > import org.apache.spark.mllib.regression.LabeledPoint; >> > >> > import org.apache.spark.mllib.linalg.SparseVector; >> > >> > import org.apache.spark.mllib.classification.NaiveBayes; >> > >> > import scala.collection.mutable.ArrayBuffer >> > >> > >> > >> > def isEmpty(a: String): Boolean = a != null && >> > !a.replaceAll("""(?m)\s+$""", >> > "").isEmpty() >> > >> > def parsePoint(a: String): LabeledPoint = { >> > >> > val values = a.split('\t') >> > >> > val feat = values(1).split(' ') >> > >> > val indices = ArrayBuffer.empty[Int] >> > >> > val featValues = ArrayBuffer.empty[Double] >> > >> > for (f <- feat) { >> > >> > val q = f.split(':') >> > >> > if (q.length == 2) { >> > >> > indices += (q(0).toInt) >> > >> > featValues += (q(1).toDouble) >> > >> > } >> > >> > } >> > >> > val vector = new SparseVector(2357815, indices.toArray, >> > featValues.toArray) >> > >> > return LabeledPoint(values(0).toDouble, vector) >> > >> > } >> > >> > >> > val data = sc.textFile("data.txt") >> > >> > val empty = data.filter(isEmpty) >> > >> > val points = empty.map(parsePoint) >> > >> > points.cache() >> > >> > val model = new NaiveBayes().run(points) >> > >> > >> > >> > On Thu, Apr 24, 2014 at 6:57 PM, Xiangrui Meng <men...@gmail.com> wrote: >> >> >> >> Do you mind sharing more code and error messages? The information you >> >> provided is too little to identify the problem. -Xiangrui >> >> >> >> On Thu, Apr 24, 2014 at 1:55 PM, John King >> >> <usedforprinting...@gmail.com> >> >> wrote: >> >> > Last command was: >> >> > >> >> > val model = new NaiveBayes().run(points) >> >> > >> >> > >> >> > >> >> > On Thu, Apr 24, 2014 at 4:27 PM, Xiangrui Meng <men...@gmail.com> >> >> > wrote: >> >> >> >> >> >> Could you share the command you used and more of the error message? >> >> >> Also, is it an MLlib specific problem? -Xiangrui >> >> >> >> >> >> On Thu, Apr 24, 2014 at 11:49 AM, John King >> >> >> <usedforprinting...@gmail.com> wrote: >> >> >> > ./spark-shell: line 153: 17654 Killed >> >> >> > $FWDIR/bin/spark-class org.apache.spark.repl.Main "$@" >> >> >> > >> >> >> > >> >> >> > Any ideas? >> >> > >> >> > >> > >> > > >