RE: HDP 2.0 GA?

2013-11-05 Thread Jim Falgout
HDP 2.0.6 is the GA version that matches Apache Hadoop 2.2. From: John Lilley john.lil...@redpoint.net Sent: Tuesday, November 05, 2013 12:34 PM To: user@hadoop.apache.org Subject: HDP 2.0 GA? I noticed that HDP 2.0 is available for download here:

RE: One file per mapper

2011-07-05 Thread Jim Falgout
I've done this before by placing the name of each file to process into a single file (newline separated) and using the NLineInputFormat class as the input format. Run your job with the single file with all of the file names to process as the input. Each mapper will then be handed one line (this

RE: Force single map task execution per node for a job

2011-04-15 Thread Jim Falgout
I'm not sure that is possible. You can use the NLineInputFormat as a control file and have a line per node in the cluster. I've used that technique for a data generation program and it works well. This will run a pre-determined number of mappers. However, it's up to the scheduler to decide when

implementing Reducer.run()

2011-04-15 Thread Jim Falgout
I'm looking at implementing an advanced Reducer by overriding the Reducer.run() method. Seems straightforward enough but looking at the default implementation it invokes: ((ReduceContext.ValueIterator) (context.getValues().iterator())).resetBackupStore(); The problem is that

RE: Streaming mappers frequently time out

2011-03-23 Thread Jim Falgout
I've run into that before. Try setting mapreduce.task.timeout. I seem to remember that setting it to zero may turn off the timeout, but of course can be dangerous if you have a runaway task. The default is 600 seconds ;-) Check out

RE: Quick question

2011-02-18 Thread Jim Falgout
That's right. The TextInputFormat handles situations where records cross split boundaries. What your mapper will see is whole records. -Original Message- From: maha [mailto:m...@umail.ucsb.edu] Sent: Friday, February 18, 2011 1:14 PM To: common-user Subject: Quick question Hi all,

RE: HDFS block size v.s. mapred.min.split.size

2011-02-17 Thread Jim Falgout
Generally, if you have large files, setting the block size to 128M or larger is helpful. You can do that on a per file basis or set the block size for the whole filesystem. The larger block size cuts down on the number of map tasks required to handle the overall data size. I've experimented