Map reduce streaming unable to partition

2011-02-10 Thread Kelly Burkhart
Hi, I'm trying to get partitioning working from a streaming map/reduce job. I'm using hadoop r0.20.2. Consider the following files, both in the same hdfs directory: f1: 01:01:01a,a,a,a,a,1 01:01:02a,a,a,a,a,2 01:02:01a,a,a,a,a,3 01:02:02a,a,a,a,a,4 02:01:01a,a,a,a,a,5 02:01:02a,a,a,a,a,6 02:02:

Re: Map reduce streaming unable to partition

2011-02-10 Thread Kelly Burkhart
11:45 AM, Kelly Burkhart wrote: > Hi, > > I'm trying to get partitioning working from a streaming map/reduce > job.  I'm using hadoop r0.20.2. > > Consider the following files, both in the same hdfs directory: > > f1: > 01:01:01a,a,a,a,a,1 > 01:01:02a,a,a,a,a

Reduce java.lang.OutOfMemoryError

2011-02-16 Thread Kelly Burkhart
Hello, I'm seeing frequent fails in reduce jobs with errors similar to this: 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201102081823_0175_m_002153_0, compressed len: 172492, decompressed len: 172488 2011-02-15 15:21:10,163 FATAL org.apache.hadoop.mapred.Task

Re: Reduce java.lang.OutOfMemoryError

2011-02-16 Thread Kelly Burkhart
e know if it helps. > > On Wed, Feb 16, 2011 at 8:30 PM, Kelly Burkhart > wrote: > >> Hello, I'm seeing frequent fails in reduce jobs with errors similar to >> this: >> >> >> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask: >> hea

Re: Reduce java.lang.OutOfMemoryError

2011-02-16 Thread Kelly Burkhart
job. You can set it to something like: -Xm512M to increase the amount > of memory used by the JVM spawned for the reducer task. > > -Original Message- > From: Kelly Burkhart [mailto:kelly.burkh...@gmail.com] > Sent: Wednesday, February 16, 2011 9:12 AM > To: common-user@ha

Re: Reduce java.lang.OutOfMemoryError

2011-02-16 Thread Kelly Burkhart
t; > Sent from my mobile. Please excuse the typos. > > On 2011-02-16, at 9:10 AM, Kelly Burkhart wrote: > >> Our clust admin (who's out of town today) has mapred.child.java.opts >> set to -Xmx1280 in mapred-site.xml.  However, if I go to the job >> configuratio

Re: Reduce java.lang.OutOfMemoryError

2011-02-16 Thread Kelly Burkhart
in the logs > > Cheers > James > > Sent from my mobile. Please excuse the typos. > > On 2011-02-16, at 9:21 AM, Kelly Burkhart wrote: > >> I should have mentioned this in my last email: I thought of that so I >> logged into every machine in the cluster; each machi

Re: Reduce java.lang.OutOfMemoryError

2011-02-16 Thread Kelly Burkhart
Thank you for the hint. I'm fairly new to this so nothing is well known to me at this time ;-) -K On Wed, Feb 16, 2011 at 1:58 PM, Rahul Jain wrote: > If you google for such memory failures, you'll find the mapreduce tunable > that'll help you: > > mapred.job.shuffle.input.buffer.percent ; it i

HDFS file content restrictions

2011-03-04 Thread Kelly Burkhart
Hello, are the restrictions to the size or "width" of text files placed in HDFS? I have a file structure like this: It would be helpful if in some circumstances I could make text data really large (large meaning many KB to one/few MB). I may have some rows that have a very small payload and so

Re: HDFS file content restrictions

2011-03-04 Thread Kelly Burkhart
On Fri, Mar 4, 2011 at 1:42 PM, Harsh J wrote: > HDFS does not operate with records in mind. So does that mean that HDFS will break a file at exactly bytes? Map/Reduce *does* operate with records in mind, so what happens to the split record? Does HDFS put the fragments back together and delive