Hi, Aniket,
What is your Pig script? Is the UDF in map side or reduce side?

Daniel

Dmitriy Ryaboy wrote:
That's a max of 3.3K single-character strings. Even with the java overhead
that shouldn't be more than a meg right?
none of these should make it out of young gen assuming the list "cats"
doesn't stick around outside the udf.

On Thu, Feb 24, 2011 at 3:49 PM, Aniket Mokashi <amoka...@andrew.cmu.edu>wrote:

Hi Jai,

Thanks for your email. I suspect that its the Strings in tight loop reason
as you have suggested. I have a loop in my udf that does the following.

while((startInd = someLog.indexOf('[',startInd)) > 0) {
                               endInd = someLog.indexOf(']', startInd);
                               if(endInd > 0) {
                                       category =
someLog.substring(startInd, endInd+1);
                                       cats.add(category);
                               }
                               startInd = endInd;
                       }

My jobs are failing in both local and mr mode. UDF works fine for a
smaller input (a few lines). Also, I checked that sizeof someLog doesnt
exceed a 10000.

Thanks,
Aniket


On Thu, February 24, 2011 3:58 am, Jai Krishna wrote:
Sharing the code would be useful as mentioned. Also of help would the
heap settings that the JVM had.

However, off the top of my head, one common situation (esp. in text
processing/tokenizing) is instantiating Strings in a tight loop.

Besides you could also exercise your UDF in a local JVM and take a heap
dump / profile it. If your heap is less than 512M, you could use basic
profiling via hprof/hat (see
http://java.sun.com/developer/technicalArticles/Programming/HPROF.html).


Thanks,
Jai



On 2/24/11 9:26 AM, "Dmitriy Ryaboy" <dvrya...@gmail.com> wrote:


Aniket, share the code?
It really depends on how you create them.


-D


On Wed, Feb 23, 2011 at 7:49 PM, Aniket Mokashi
<amoka...@andrew.cmu.edu>wrote:


I ve written a simple UDF that parses a chararray (which looks like
...[a].....[b]...[a]...) to capture stuff inside brackets and return
them as String a=2;b=1; and so on. The input chararray are rarely more
than 1000 characters and are not more than 100000 (I ve added log.warn
in my udf to ensure this). But, I still see java heap error while
running this udf (even in local mode, the job simply fails). My
assumption is maps and lists that I use locally will be recollected by
gc. Am I missing something?

Thanks,
Aniket





Reply via email to