As of today, the way Pig uses Tez, it specified disk as the destination of 
intermediate data. So its not kept in memory (though it may be in the OS buffer 
cache).
There are other artifacts (some of which are derived from intermediate inputs 
e.g. one side of a broadcast join) that are kept in memory in the container. 
These have a lifecycle associated with them which is set by Pig and are 
de-referenced (and GC’d) upon termination of that lifecycle.



From: Sachin Sabbarwal [mailto:[email protected]]
Sent: Tuesday, July 07, 2015 10:56 AM
To: [email protected]
Subject: TEZ: Why there is no OOME in TEZ when intermediate files are kept in 
memory?

Hi Guys
One quick question. I've read that when pig is running in TEZ mode, 
intermediate files are not stored in HDFS, are kept in memory instead. So that 
it's available to next task which is going to reuse our container. My question 
is how does TEZ prevent a case of OOME, cause not writing to disk and keeping 
everything in memory might cause OOME. Is there any threshold after which files 
are written to HDFS? Am i missing something here?

Thanks
--
Sachin Sabbarwal
Linkedin: https://www.linkedin.com/profile?viewProfile=&key=95777265
Facebook: facebook.com/sachinsabbarwal<http://facebook.com/sachinsabbarwal>
Quora: http://www.quora.com/Sachin-Sabbarwal
Blog: http://sachinsabbarwal.tumblr.com/

Reply via email to