Re: Help Troubleshooting Naive Bayes

2014-10-02 Thread Sandy Ryza
Those logs you included are from the Spark executor processes, as opposed to the YARN NodeManager processes. If you don't think you have access to the NodeManager logs, I would try setting spark.yarn.executor.memoryOverhead to something like 1024 or 2048 and seeing if that helps. If it does, it's

Re: Help Troubleshooting Naive Bayes

2014-10-02 Thread Mike Bernico
Hello Xiangrui and Sandy, Thanks for jumping in to help. So, first thing... After my email last night I reran my code using 10 executors, 2G each, and everything ran okay. So, that's good, but I'm still curious as to what I was doing wrong. For Xiangrui's questions: My training set is 49174

Re: Help Troubleshooting Naive Bayes

2014-10-02 Thread Sandy Ryza
Hi Mike, Do you have access to your YARN NodeManager logs? When executors die randomly on YARN, it's often because they use more memory than allowed for their YARN container. You would see messages to the effect of "container killed because physical memory limits exceeded". -Sandy On Wed, Oct

Re: Help Troubleshooting Naive Bayes

2014-10-01 Thread Xiangrui Meng
The cost depends on the feature dimension, number of instances, number of classes, and number of partitions. Do you mind sharing those numbers? -Xiangrui On Wed, Oct 1, 2014 at 6:31 PM, Mike Bernico wrote: > Hi Everyone, > > I'm working on training mllib's Naive Bayes to classify TF/IDF vectoried

Help Troubleshooting Naive Bayes

2014-10-01 Thread Mike Bernico
Hi Everyone, I'm working on training mllib's Naive Bayes to classify TF/IDF vectoried docs using Spark 1.1.0. I've gotten this to work fine on a smaller set of data, but when I increase the number of vectorized documents I get hung up on training. The only messages I'm seeing are below. I'm pr