Lots of Datanode SocketTimeoutException

2012-02-25 Thread Clay Chiang
Hi All, We have a HDFS cluster with ~200 nodes, and for some reason, it's divided into 4 MR clusters which sharing the same HDFS. Recently, we saw a lots of SocketTimeoutException in datanode log, such as: 2012-02-24 11:57:51,882 WARN datanode.DataNode (DataXceiver.java:readBlock(236))

Re: MapReduce tunning

2012-02-25 Thread Jie Li
Hello Mohit, I am looking at some hadoop tuning parameters like io.sort.mb, mapred.child.javaopts etc. - My question was where to look at for current setting The default settings as well as the documentations can be found in Hadoop directory: src/mapred/mapred-default.xml

Re: Experience with Hadoop in production

2012-02-25 Thread Jie Li
Hi Pavel, Seems your team spent some time on the performance and tuning issues. Just wonder whether an automatic Hadoop tuning tool like Starfish would be interesting to you. We'd like to exchange the tuning experience with you. Thanks, Jie Starfish Group, Duke

Re: How to estimate hadoop?

2012-02-25 Thread Jie Li
Hi Jinyan, I'd like to introduce you our system Starfish, which can be used to analyze and estimate the Hadoop performance and memory usage. With Starfish, you can analyze the performance of your Hadoop job at fine grained levels, e.g. the time for map processing, spilling, merging, shuffling,

Re: MapReduce tunning

2012-02-25 Thread sriramsrao
Use a search engine to find the Hadoop best practices blog by Arun Murthy. Sriram On Feb 24, 2012, at 10:36 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am looking at some hadoop tuning parameters like io.sort.mb, mapred.child.javaopts etc. - My question was where to look at for

dfs.block.size

2012-02-25 Thread Mohit Anchlia
If I want to change the block size then can I use Configuration in mapreduce job and set it when writing to the sequence file or does it need to be cluster wide setting in .xml files? Also, is there a way to check the block of a given file?

Re: LZO with sequenceFile

2012-02-25 Thread Shi Yu
Yes, it is supported by Hadoop sequence file. It is splittable by default. If you have installed and specified LZO correctly, use these: org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputForma t.setCompressOutput(job,true);

Re: LZO with sequenceFile

2012-02-25 Thread Mohit Anchlia
Thanks. Does it mean LZO is not installed by default? How can I install LZO? On Sat, Feb 25, 2012 at 6:27 PM, Shi Yu sh...@uchicago.edu wrote: Yes, it is supported by Hadoop sequence file. It is splittable by default. If you have installed and specified LZO correctly, use these: