Xiangrui, Yes, the total number of terms is 43839. I have also tried running it using different values of parallelism ranging from 1/core to 10/core. I also used multiple configurations like setting spark.storage.memoryFaction and spark.shuffle.memoryFraction to default values. The point to note here is that I am not using caching or persisting the RDDs and therefore I set the storage fraction to 0.
The driver data available under executors tab is as follows for 3GB of allocated memory: Memory: 0.0 B Used (1781.8 MB Total) Disk: 0.0 B Used Executor ID Address RDD Blocks Memory Used Disk Used Active Tasks Failed Tasks Complete Tasks Total Tasks Task Time Shuffle Read Shuffle Write <driver> ephesoft29:59494 0 0.0 B / 1781.8 MB 0.0 B 1 0 4 5 19.3 s 0.0 B 27.5 MB Memory used value is always 0 for the driver. Is there something fishy here? The out of memory exception occurs in NaiveBayes.scala at combineByKey (line 91) or collect (line 96) based on the heap size allocated. In the memory profiler, the program runs fine until TFIDF creation, but when training starts, the memory usage goes up until the point of failure. I want to understand if the OOM exception is occurring on driver or the worker node.It should not be worker node, because as I understand, spark automatically spills the data from memory to disk if available memory is not adequate. Then why do I get these errors at all? If it is the driver, then how do I calculate the total memory requirements as 3-4 GB ram for training approximately 13 MB of training data with 43839 terms is preposterous. My expectation was that with spark was that if the memory is available it would be much faster than Mahout, but if enough memory is not there, then it would only be slower and not throw exceptions. Mahout ran fine with much larger data, and it too had to collect a lot of data on a single node during training. May be I am not getting the point here due to my limited knowledge of Spark. Please help me out with this and point me the right direction. Thanks, Jatin ----- Novice Big Data Programmer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Out-of-memory-exception-in-MLlib-s-naive-baye-s-classification-training-tp14809p14879.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org