Re: Out of memory exception in MLlib's naive baye's classification training

jatinpreet Tue, 23 Sep 2014 03:12:27 -0700

Xiangrui, 

Yes, the total number of terms is 43839. I have also tried running it using
different values of parallelism ranging from 1/core to 10/core. I also used
multiple configurations like setting spark.storage.memoryFaction and
spark.shuffle.memoryFraction to default values. The point to note here is
that I am not using caching or persisting the RDDs and therefore I set the
storage fraction to 0.


The driver data available under executors tab is as follows for 3GB of
allocated memory:

Memory: 0.0 B Used (1781.8 MB Total)
Disk: 0.0 B Used
Executor ID     Address RDD Blocks      Memory Used     Disk Used       Active 
Tasks    Failed
Tasks   Complete Tasks  Total Tasks     Task Time       Shuffle Read    Shuffle 
Write
<driver>        ephesoft29:59494        0       0.0 B / 1781.8 MB       0.0 B   
1       0       4       5       19.3 s  0.0 B
27.5 MB


Memory used value is always 0 for the driver. Is there something fishy here?

The out of memory exception occurs in NaiveBayes.scala at combineByKey (line
91) or collect (line 96) based on the heap size allocated. In the memory
profiler, the program runs fine until TFIDF creation, but when training
starts, the memory usage goes up until the point of failure.

I want to understand if the OOM exception is occurring on driver or the
worker node.It should not be worker node, because as I understand, spark
automatically spills the data from memory to disk if available memory is not
adequate. Then why do I get these errors at all? If it is the driver, then
how do I calculate the total memory requirements as 3-4 GB ram for training
approximately 13 MB of training data  with 43839 terms is preposterous.

My expectation was that with spark was that if the memory is available it
would be much faster than Mahout, but if enough memory is not there, then it
would only be slower and not throw exceptions. Mahout ran fine with much
larger data, and it too had to collect a lot of data on a single node during
training.

May be I am not getting the point here due to my limited knowledge of Spark.
Please help me out with this and point me the right direction.

Thanks,
Jatin




-----
Novice Big Data Programmer
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Out-of-memory-exception-in-MLlib-s-naive-baye-s-classification-training-tp14809p14879.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Out of memory exception in MLlib's naive baye's classification training

Reply via email to