There are two ways pig controls the memory used by large bags -
1. Triggers set on GC, similar to mechanism described by Julien here -
https://techblug.wordpress.com/2011/07/21/detecting-low-memory-in-java-part-2/
. When pig gets notified about high memory usage, it goes through the
list of Spi
I have a load that is causing a schema error.
images = LOAD 'imagelistsbysite' USING BinStorage() AS (site:chararray,
top_image_by_site:bag{image:chararray, cnt:long,
top_url_by_image:bag{url:chararray,title:chararray,cnt:long}});
Yields this error:
Caused by: org.apache.pig.impl.logicalLayer.sch
Daniel,
iirc spill requests are triggered by a gc, and spill_count is triggered by
an actual spill, so the former number may be a bit misleading (if gc is
effective, lots of gcs might be fine).
D
On Wed, Aug 3, 2011 at 10:12 AM, Daniel Dai wrote:
> Spill means Pig need to dump memory into disk.
Spill means Pig need to dump memory into disk. It happens when Pig
deals with a large key, and Pig run short of memory. The high number
indicates Pig need to write to disk frequently and performance may
downgrade, and you may explore approach, such as using skewed join.
Daniel
On Tue, Aug 2, 2011