Re: Stackoverflow

2008-06-04 Thread Chris Douglas
The pivot selection is the median of the first, middle, and last elements; it should be the best choice for sorted data. It's still possible to pick bad pivots, but data that forces hundreds of consecutive bad pivot selections should be exceedingly rare. -C On Jun 4, 2008, at 9:24 AM, Doug

Re: Stackoverflow

2008-06-04 Thread Doug Cutting
Andreas Kostyrka wrote: java.lang.StackOverflowError at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:494) at org.apache.hadoop.util.QuickSort.fix(QuickSort.java:29) at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:58) at org.apache.

RE: Stackoverflow

2008-06-04 Thread Devaraj Das
lot! Devaraj > -Original Message- > From: Andreas Kostyrka [mailto:[EMAIL PROTECTED] > Sent: Wednesday, June 04, 2008 4:56 AM > To: core-user@hadoop.apache.org > Subject: Re: Stackoverflow > > Ok, I've tried it out, the example sort bombs exactly like

Re: Stackoverflow

2008-06-04 Thread Steve Loughran
Andreas Kostyrka wrote: Ok, a new dead job: ;( This time after 2.4GB/11,3M lines ;( Any idea what I could do debug this? (No idea how to go at debugging a Java process that is distributed and does GBs of data. Its one of the big problems of distributed computing; distributed debugging How

Re: Stackoverflow

2008-06-03 Thread Andreas Kostyrka
Ok, I've tried it out, the example sort bombs exactly like streaming => http://heaven.kostyrka.org/test.log Any recommendations? Thanks, Andreas signature.asc Description: This is a digitally signed message part.

Re: Stackoverflow

2008-06-03 Thread Andreas Kostyrka
On Tuesday 03 June 2008 22:16:05 Andreas Kostyrka wrote: > On Tuesday 03 June 2008 21:00:49 Runping Qi wrote: > > ${hadoop} jar hadoop-0.17-examples.jar sort -m \ > > > > >    -r 88 \ > > >    -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \ > > >    -outFormat org.apache.hadoop.mapred

Re: Stackoverflow

2008-06-03 Thread Andreas Kostyrka
On Tuesday 03 June 2008 21:00:49 Runping Qi wrote: > ${hadoop} jar hadoop-0.17-examples.jar sort -m \ > > >    -r 88 \ > >    -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \ > >    -outFormat org.apache.hadoop.mapred.lib.NullOutputFormat \ > >    -outKey org.apache.hadoop.io.Text \ >

Re: Stackoverflow

2008-06-03 Thread Chris Douglas
ECTED] Sent: Tuesday, June 03, 2008 11:35 AM To: core-user@hadoop.apache.org Subject: Re: Stackoverflow By "not exactly small, do you mean each line is long or that there are many records? Well, not small in the meaning, that even I could get my boss to allow me to give you the data, transferin

RE: Stackoverflow

2008-06-03 Thread Runping Qi
outFormat org.apache.hadoop.mapred.lib.NullOutputFormat \ >-outKey org.apache.hadoop.io.Text \ >-outValue org.apache.hadoop.io.Text \ > instead. Runping > -Original Message- > From: Chris Douglas [mailto:[EMAIL PROTECTED] > Sent: Tuesday, June 03, 2008 11:35 AM > To: core-user@h

Re: Stackoverflow

2008-06-03 Thread Andreas Kostyrka
On Tuesday 03 June 2008 20:35:03 Chris Douglas wrote: > >> By "not exactly small, do you mean each line is long or that there > >> are many records? > > > > Well, not small in the meaning, that even I could get my boss to > > allow me to > > give you the data, transfering it might be painful. (E.g.

Re: Stackoverflow

2008-06-03 Thread Chris Douglas
By "not exactly small, do you mean each line is long or that there are many records? Well, not small in the meaning, that even I could get my boss to allow me to give you the data, transfering it might be painful. (E.g. the job that aborted had about 12M lines with with ~2.6GB data => the lin

Re: Stackoverflow

2008-06-03 Thread Andreas Kostyrka
Ok, a new dead job: ;( This time after 2.4GB/11,3M lines ;( Any idea what I could do debug this? (No idea how to go at debugging a Java process that is distributed and does GBs of data. How does one stabilize that kind of stuff to generate a reproducable situation?) Andresa signature.asc Des

Re: Stackoverflow

2008-06-03 Thread Andreas Kostyrka
On Tuesday 03 June 2008 08:35:10 Chris Douglas wrote: > > I have no Java implementation of my job, sorry. > > Since it's all in the map side, IdentityMapper/IdentityReducer is > fine, as long as both the splits and the number of reduce tasks are > the same. > > > The data is a representation for lo

Re: Stackoverflow

2008-06-02 Thread Chris Douglas
I have no Java implementation of my job, sorry. Since it's all in the map side, IdentityMapper/IdentityReducer is fine, as long as both the splits and the number of reduce tasks are the same. The data is a representation for loglines, and not exactly small, e.g. the stuff has already be

Re: Stackoverflow

2008-06-02 Thread Andreas Kostyrka
On Tuesday 03 June 2008 04:53:22 Chris Douglas wrote: > Is anyone observing this outside of streaming? > > We've been able to reproduce this trace with a bad comparator that > only returns negative values, but haven't found any uncontrived > patterns in data that produce this, nor any comparators i