Re: how to observe spill operation during shuffle in mapreduce?

2013-09-05 Thread Ravi Kiran
Hi , You can look at the job metrics from your jobtracker Web UI . The "Spilled Record" Counter under the group "Map Reduce Framework" displays the number of records spilled in both map and reduce tasks. Regards Ravi Magham. On Thu, Sep 5, 2013 at 12:23 PM, ch huang wrote: > hi,all: >

Re: how to benchmark the whole hadoop cluster performance?

2013-09-02 Thread Ravi Kiran
You can also look at a ) https://github.com/intel-hadoop/HiBench Regards Ravi Magham On Mon, Sep 2, 2013 at 12:26 PM, ch huang wrote: > hi ,all: >i want to evaluate my hadoop cluster performance ,what tool can i > use? (TestDFSIO,nnbench?) >

Re: secondary sort - number of reducers

2013-08-31 Thread Ravi Kiran
Adeel, To add to Yong's points a) Consider tuning the number of threads in reduce tasks and the task tracker process. mapred.reduce.parallel.copies b) See if the map output can be compressed to ensure there is less IO . c) Increase the io.sort.factor to ensure the framework merges a larg

Re: sqoop oracle connection error

2013-08-31 Thread Ravi Kiran
Hi , Can you check if you are able to ping or telnet to the ip address and port of Oracle database from your machine. I have a hunch that Oracle Listener is stopped . If so , start it. The commands to check the status and start if the listener isn't running. $ lsnrctl status $ lsnrctl start R

Re: WritableComparable.compareTo vs RawComparator.compareTo

2013-08-31 Thread Ravi Kiran
Also, if both are defined , the framework will use RawComparator . I hope you have registered the comparator in a static block as follows static { WritableComparator.define(PairOfInts.class, new Comparator()); } Regards Ravi Magham On Sat, Aug 31, 2013 at 1:23 PM, Ravi Kiran wrote: > Hi Ad

Re: WritableComparable.compareTo vs RawComparator.compareTo

2013-08-31 Thread Ravi Kiran
Hi Adeel, The RawComparator is the fastest between the two as you avoid the need to convert the byte stream to Writable objects for comparison . Regards Ravi Magham On Fri, Aug 30, 2013 at 11:16 PM, Adeel Qureshi wrote: > For secondary sort I am implementing a RawComparator and providing t

Re: MapReduce Tutorial tweak

2013-08-27 Thread Ravi Kiran
Also to add, the default serialization libraries supported are specified in core-default,xml as io.serializations org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization

Re: Writing multiple tables from reducer

2013-08-27 Thread Ravi Kiran
I have written a blog on this a while ago where I was writing to multiple tables from my mapper class. You can look into it at http://bigdatabuzz.wordpress.com/2012/04/24/how-to-write-to-multiple-hbase-tables-in-a-mapreduce-job/ Key things are, a) job.setOutputFormatClass (MultiTableOutputFormat.c

Re: is it possible to run a executable jar with ClientAPI?

2013-08-22 Thread Ravi Kiran
Hi , You can definitely run the Driver (ClassWithMain) to a remote hadoop cluster from say Eclipse following the steps under a) Have the jar (Some.jar) in your classpath of your project in Eclipse . b) Ensure you have set both the Namenode and Job Tracker information either in core-site.xml and

Re: Make job output be a comma separated file

2013-07-18 Thread Ravi Kiran
t; Configuration conf = new Configuration(); > > conf.set("mapreduce.output.textoutputformat.separator", ","); > > ** ** > > Am I changing the field right?**** > > ** ** > > Thanks, > > Andrew > > ** ** > > *From:* Ravi

Re: Make job output be a comma separated file

2013-07-18 Thread Ravi Kiran
Hi Andrew, You can pass change the default keyValueSeparator of the output format from a "\t" to a "," by setting the following property *mapred.textoutputformat.separator* to Configuration of the job. You will face difficulties if this output is an input to another job as you wouldn't kno