from:"Some Body"

block errors

2010-07-13 Thread Some Body

Hi All, I had a MR job that processed 2000 small (<3MB ea.) files and it took 40 minutes on 8 nodes. Since the files are small it triggerred 2000 tasks. I packed my 2000 files into a single 445MB sequence file (K,V == Text,Text == ,). The new MR job triggers 7 map tasks (approx 64MB each) bu

SequenceFile as map input

2010-07-08 Thread Some Body

To get around the small-file-problem (I have thousands of 2MB log files) I wrote a class to convert all my log files into a single SequenceFile in (Text key, BytesWritable value) format. That works fine. I can run this: hadoop fs -text /my.seq |grep peemt114.log | head -1 10/07/08 15:02:

custom counter

2010-06-29 Thread Some Body

Hi, I'm using Cloudera's 0.20.2+228 release. How do I create a custom Counter using the NEW API? In my Mapper class I tried this: public class MyMapper extends Mapper { static enum recordTypes { GOOD, BAD, IGNORED }; public void map(Object key, Text value, Contex

Re: MultipleOutputs or Partitioner

2010-05-10 Thread Some Body

cify the number of reducers as the number of files you want, which is not the best option if some days have more data than the others. You also dont have control over the file name. See Tom White's Hadoop The Definitive Guide for an excellent example and usage. Thanks and Regards, Sonal w

MultipleOutputs or Partitioner

2010-05-10 Thread Some Body

Hi, I'm trying to understand how to generate multiple outputs in my reducer (using 0.20.2+228). Do I need MultipleOutput or should I partition my output in the mapper? My reducer currently gets key/val input pairs like this which all end up in my part_r_ file. hostA_VarX_2010-05-01_mor

block errors

SequenceFile as map input

custom counter

Re: MultipleOutputs or Partitioner

MultipleOutputs or Partitioner

5 matches

Site Navigation

Mail list logo

Footer information