Re: Spark mllib throwing error

Xiangrui Meng Thu, 24 Apr 2014 22:42:07 -0700

I only see one risk: if your feature indices are not sorted, it might
have undefined behavior. Other than that, I don't see any thing
suspicious. -Xiangrui


On Thu, Apr 24, 2014 at 4:56 PM, John King <usedforprinting...@gmail.com> wrote:
> It just displayed this error and stopped on its own. Do the lines of code
> mentioned in the error have anything to do with it?
>
>
> On Thu, Apr 24, 2014 at 7:54 PM, Xiangrui Meng <men...@gmail.com> wrote:
>>
>> I don't see anything wrong with your code. Could you do points.count()
>> to see how many training examples you have? Also, make sure you don't
>> have negative feature values. The error message you sent did not say
>> NaiveBayes went wrong, but the Spark shell was killed. -Xiangrui
>>
>> On Thu, Apr 24, 2014 at 4:05 PM, John King <usedforprinting...@gmail.com>
>> wrote:
>> > In the other thread I had an issue with Python. In this issue, I tried
>> > switching to Scala. The code is:
>> >
>> > import org.apache.spark.mllib.regression.LabeledPoint;
>> >
>> > import org.apache.spark.mllib.linalg.SparseVector;
>> >
>> > import org.apache.spark.mllib.classification.NaiveBayes;
>> >
>> > import scala.collection.mutable.ArrayBuffer
>> >
>> >
>> >
>> > def isEmpty(a: String): Boolean = a != null &&
>> > !a.replaceAll("""(?m)\s+$""",
>> > "").isEmpty()
>> >
>> > def parsePoint(a: String): LabeledPoint = {
>> >
>> >                val values = a.split('\t')
>> >
>> >                val feat = values(1).split(' ')
>> >
>> >                val indices = ArrayBuffer.empty[Int]
>> >
>> >                val featValues = ArrayBuffer.empty[Double]
>> >
>> >                for (f <- feat) {
>> >
>> >                    val q = f.split(':')
>> >
>> >                    if (q.length == 2) {
>> >
>> >                       indices += (q(0).toInt)
>> >
>> >                       featValues += (q(1).toDouble)
>> >
>> >                }
>> >
>> >                }
>> >
>> >                val vector = new SparseVector(2357815, indices.toArray,
>> > featValues.toArray)
>> >
>> >                return LabeledPoint(values(0).toDouble, vector)
>> >
>> >                }
>> >
>> >
>> > val data = sc.textFile("data.txt")
>> >
>> > val empty = data.filter(isEmpty)
>> >
>> > val points = empty.map(parsePoint)
>> >
>> > points.cache()
>> >
>> > val model = new NaiveBayes().run(points)
>> >
>> >
>> >
>> > On Thu, Apr 24, 2014 at 6:57 PM, Xiangrui Meng <men...@gmail.com> wrote:
>> >>
>> >> Do you mind sharing more code and error messages? The information you
>> >> provided is too little to identify the problem. -Xiangrui
>> >>
>> >> On Thu, Apr 24, 2014 at 1:55 PM, John King
>> >> <usedforprinting...@gmail.com>
>> >> wrote:
>> >> > Last command was:
>> >> >
>> >> > val model = new NaiveBayes().run(points)
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Apr 24, 2014 at 4:27 PM, Xiangrui Meng <men...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Could you share the command you used and more of the error message?
>> >> >> Also, is it an MLlib specific problem? -Xiangrui
>> >> >>
>> >> >> On Thu, Apr 24, 2014 at 11:49 AM, John King
>> >> >> <usedforprinting...@gmail.com> wrote:
>> >> >> > ./spark-shell: line 153: 17654 Killed
>> >> >> > $FWDIR/bin/spark-class org.apache.spark.repl.Main "$@"
>> >> >> >
>> >> >> >
>> >> >> > Any ideas?
>> >> >
>> >> >
>> >
>> >
>
>

Re: Spark mllib throwing error

Reply via email to