Re: Proactive Spill Count Recs

2011-08-03 Thread Thejas Nair
There are two ways pig controls the memory used by large bags - 1. Triggers set on GC, similar to mechanism described by Julien here - https://techblug.wordpress.com/2011/07/21/detecting-low-memory-in-java-part-2/ . When pig gets notified about high memory usage, it goes through the list of Spi

Weird error moving from pig 0.6 to pig 0.8x, but fixed in 0.9x

2011-08-03 Thread Corbin Hoenes
I have a load that is causing a schema error. images = LOAD 'imagelistsbysite' USING BinStorage() AS (site:chararray, top_image_by_site:bag{image:chararray, cnt:long, top_url_by_image:bag{url:chararray,title:chararray,cnt:long}}); Yields this error: Caused by: org.apache.pig.impl.logicalLayer.sch

Re: Proactive Spill Count Recs

2011-08-03 Thread Dmitriy Ryaboy
Daniel, iirc spill requests are triggered by a gc, and spill_count is triggered by an actual spill, so the former number may be a bit misleading (if gc is effective, lots of gcs might be fine). D On Wed, Aug 3, 2011 at 10:12 AM, Daniel Dai wrote: > Spill means Pig need to dump memory into disk.

Re: Proactive Spill Count Recs

2011-08-03 Thread Daniel Dai
Spill means Pig need to dump memory into disk. It happens when Pig deals with a large key, and Pig run short of memory. The high number indicates Pig need to write to disk frequently and performance may downgrade, and you may explore approach, such as using skewed join. Daniel On Tue, Aug 2, 2011