Hi Jai, Thanks for your email. I suspect that its the Strings in tight loop reason as you have suggested. I have a loop in my udf that does the following.
while((startInd = someLog.indexOf('[',startInd)) > 0) { endInd = someLog.indexOf(']', startInd); if(endInd > 0) { category = someLog.substring(startInd, endInd+1); cats.add(category); } startInd = endInd; } My jobs are failing in both local and mr mode. UDF works fine for a smaller input (a few lines). Also, I checked that sizeof someLog doesnt exceed a 10000. Thanks, Aniket On Thu, February 24, 2011 3:58 am, Jai Krishna wrote: > Sharing the code would be useful as mentioned. Also of help would the > heap settings that the JVM had. > > However, off the top of my head, one common situation (esp. in text > processing/tokenizing) is instantiating Strings in a tight loop. > > Besides you could also exercise your UDF in a local JVM and take a heap > dump / profile it. If your heap is less than 512M, you could use basic > profiling via hprof/hat (see > http://java.sun.com/developer/technicalArticles/Programming/HPROF.html ). > > > Thanks, > Jai > > > > On 2/24/11 9:26 AM, "Dmitriy Ryaboy" <dvrya...@gmail.com> wrote: > > > Aniket, share the code? > It really depends on how you create them. > > > -D > > > On Wed, Feb 23, 2011 at 7:49 PM, Aniket Mokashi > <amoka...@andrew.cmu.edu>wrote: > > >> I ve written a simple UDF that parses a chararray (which looks like >> ...[a].....[b]...[a]...) to capture stuff inside brackets and return >> them as String a=2;b=1; and so on. The input chararray are rarely more >> than 1000 characters and are not more than 100000 (I ve added log.warn >> in my udf to ensure this). But, I still see java heap error while >> running this udf (even in local mode, the job simply fails). My >> assumption is maps and lists that I use locally will be recollected by >> gc. Am I missing something? >> >> Thanks, >> Aniket >> >> >> > >