Calculating the slot

2010-05-18 Thread Ferdinand Neman
Hi All, Im new to hadoop and successfuly runs many times MapRed task on my small cluster (6 machines). Now I realizes that by default only 1 reducer assigned to the job. and with only 1 reducer things going slow. I've read some documents and about to increase the number of reducer. Hadoop Definit

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Todd Lipcon
On Tue, May 18, 2010 at 2:50 PM, Jones, Nick wrote: > I'm not familiar with how to use/create them, but shouldn't a HAR (Hadoop > Archive) work well in this situation? I thought it was designed to collect > several small files together through another level indirection to avoid the > NN load and

RE: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Jones, Nick
I'm not familiar with how to use/create them, but shouldn't a HAR (Hadoop Archive) work well in this situation? I thought it was designed to collect several small files together through another level indirection to avoid the NN load and decreasing the HDFS block size. Nick Jones -Original

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Patrick Angeles
That wasn't sarcasm. This is what you do: - Run your mapreduce job on 30k small files. - Consolidate your 30k small files into larger files. - Run mapreduce ok the larger files. - Compare the running time The difference in runtime is made up by your task startup and seek overhead. If you want to

Re: JAVA_HOME not set

2010-05-18 Thread Erik Test
Hm. I actually just changed to this version Erik On 18 May 2010 15:59, David Howell wrote: > Are you using Cloudera's hadoop 0.20.2? > > There's some logic in bin/hadoop-config.sh that seems to be failing if > JAVA_HOME isn't set, and it runs before hadoop-env.sh. > > If you think it might

Re: JAVA_HOME not set

2010-05-18 Thread David Howell
Are you using Cloudera's hadoop 0.20.2? There's some logic in bin/hadoop-config.sh that seems to be failing if JAVA_HOME isn't set, and it runs before hadoop-env.sh. If you think it might be the same problem, please weigh in: http://getsatisfaction.com/cloudera/topics/java_home_setting_in_hadoop

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Pierre ANCELOT
Thanks for the sarcasm but with 3 small files and so, 3 Mapper instatiations, even though it's not (and never did I say it was) he only metric that matters, it seem to me lie something very interresting to check out... I have hierarchy over me and they will be happy to understand my choices

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Brian Bockelman
Hey Konstantin, Interesting paper :) One thing which I've been kicking around lately is "at what scale does the file/directory paradigm break down?" At some point, I think the human mind can no longer comprehend so many files (certainly, I can barely organize the few thousand files on my lapt

Re: dfs.name.dir capacity for namenode backup?

2010-05-18 Thread Todd Lipcon
Yes, we recommend at least one local directory and one NFS directory for dfs.name.dir in production environments. This allows an up-to-date recovery of NN metadata if the NN should fail. In future versions the BackupNode functionality will move us one step closer to not needing NFS for production d

Re: dfs.name.dir capacity for namenode backup?

2010-05-18 Thread Andrew Nguyen
Sorry to hijack but after following this thread, I had a related question to the secondary location of dfs.name.dir. Is the approach outlined below the preferred/suggested way to do this? Is this people mean when they say, "stick it on NFS" ? Thanks! On May 17, 2010, at 11:14 PM, Todd Lipco

Re: Do we need to install both 32 and 64 bit lzo2 to enable lzo compression and how can we use gzip compressoin codec in hadoop

2010-05-18 Thread Hong Tang
Stan, See my comments inline. Thanks, Hong On May 18, 2010, at 8:44 AM, stan lee wrote: Hi Guys, I am trying to use compression to reduce the IO workload when trying to run a job but failed. I have several questions which needs your help. For lzo compression, I found a guide http://code.

Re: Data node decommission doesn't seem to be working correctly

2010-05-18 Thread Brian Bockelman
Hey Scott, If the node shows up in the dead nodes and the live nodes as you say, it's definitely not even attempting to be decommissioned. If HDFS was attempting decommissioning and you restart the namenode, then it would only show up in the dead nodes list. Another option is to just turn off

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Konstantin Boudnik
I had an experiment with block size of 10 bytes (sic!). This was _very_ slow on NN side. Like writing 5 Mb was happening for 25 minutes or so :( No fun to say the least... On Tue, May 18, 2010 at 10:56AM, Konstantin Shvachko wrote: > You can also get some performance numbers and answers to the blo

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Konstantin Shvachko
You can also get some performance numbers and answers to the block size dilemma problem here: http://developer.yahoo.net/blogs/hadoop/2010/05/scalability_of_the_hadoop_dist.html I remember some people were using Hadoop for storing or streaming videos. Don't know how well that worked. It would b

Re: preserve JobTracker information

2010-05-18 Thread Harsh J
Preserved JobTracker history is already available at /jobhistory.jsp There is a link at the end of the /jobtracker.jsp page that leads to this. There's also free analysis to go with that! :) On Tue, May 18, 2010 at 11:00 PM, Alan Miller wrote: > Hi, > > Is there a way to preserve previous job in

Re: Do we need to install both 32 and 64 bit lzo2 to enable lzo compression and how can we use gzip compressoin codec in hadoop

2010-05-18 Thread Harsh J
Hi stan, You can do something of this sort if you use FileOutputFormat, from within your Job Driver: FileOutputFormat.setCompressOutput(job, true); FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class); // GzipCodec from org.apache.hadoop.io.compress. // and where 'job'

JAVA_HOME not set

2010-05-18 Thread Erik Test
Hi All, I continually get this error when trying to run start-all.sh for hadoop 0.20.2 on ubuntu. What confuses me is I DO have JAVA_HOME set in hadoop-env.sh to /usr/lib/jvm/jdk1.6.0_17. I've double checked to see that JAVA_HOME is set to this by echoing the path before running the start script b

Re: Data node decommission doesn't seem to be working correctly

2010-05-18 Thread Scott White
Dfsadmin -report reports the hostname for that machine and not the ip. That machine happens to be the master node which is why I am trying to decommission the data node there since I only want the data node running on the slave nodes. Dfs admin -report reports all the ips for the slave nodes. One

Re: Data node decommission doesn't seem to be working correctly

2010-05-18 Thread Koji Noguchi
Hi Scott, You might be hitting two different issues. 1) Decommission not finishing. https://issues.apache.org/jira/browse/HDFS-694 explains decommission never finishing due to open files in 0.20 2) Nodes showing up both in live and dead nodes. I remember Suresh taking a look at this.

Re: Do we need to install both 32 and 64 bit lzo2 to enable lzo compression and how can we use gzip compressoin codec in hadoop

2010-05-18 Thread Ted Yu
32bit liblzo2 isn't needed on 64-bit systems. On Tue, May 18, 2010 at 8:44 AM, stan lee wrote: > Hi Guys, > > I am trying to use compression to reduce the IO workload when trying to run > a job but failed. I have several questions which needs your help. > > For lzo compression, I found a guide >

Do we need to install both 32 and 64 bit lzo2 to enable lzo compression and how can we use gzip compressoin codec in hadoop

2010-05-18 Thread stan lee
Hi Guys, I am trying to use compression to reduce the IO workload when trying to run a job but failed. I have several questions which needs your help. For lzo compression, I found a guide http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ, why it said "Note that you must have both 32-bit an

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Brian Bockelman
Hey Hassan, 1) The overhead is pretty small, measured in a small number of milliseconds on average 2) HDFS is not designed for "online latency". Even though the average is small, if something "bad happens", your clients might experience a lot of delays while going through the retry stack. The

Re: what's the mechnism to determine the reducer number and reduce progress

2010-05-18 Thread stan lee
Thanks PanFeng, do you have more detailed explanation on this? Is it caculated by how many reduce files has completed each phase? Also, what's the answer for my second question? Thanks! On Mon, May 17, 2010 at 12:44 PM, 原攀峰 wrote: > For a reduce task, the execution is divided into three phases,

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Nyamul Hassan
This is a very interesting thread to us, as we are thinking about deploying HDFS as a massive online storage for a on online university, and then serving the video files to students who want to view them. We cannot control the size of the videos (and some class work files), as they will mostly be

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread He Chen
If you know how to use AspectJ to do aspect oriented programming. You can write a aspect class. Let it just monitors the whole process of MapReduce On Tue, May 18, 2010 at 10:00 AM, Patrick Angeles wrote: > Should be evident in the total job running time... that's the only metric > that really ma

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Patrick Angeles
Should be evident in the total job running time... that's the only metric that really matters :) On Tue, May 18, 2010 at 10:39 AM, Pierre ANCELOT wrote: > Thank you, > Any way I can measure the startup overhead in terms of time? > > > On Tue, May 18, 2010 at 4:27 PM, Patrick Angeles >wrote: > >

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Pierre ANCELOT
Thank you, Any way I can measure the startup overhead in terms of time? On Tue, May 18, 2010 at 4:27 PM, Patrick Angeles wrote: > Pierre, > > Adding to what Brian has said (some things are not explicitly mentioned in > the HDFS design doc)... > > - If you have small files that take up < 64MB you

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Patrick Angeles
Pierre, Adding to what Brian has said (some things are not explicitly mentioned in the HDFS design doc)... - If you have small files that take up < 64MB you do not actually use the entire 64MB block on disk. - You *do* use up RAM on the NameNode, as each block represents meta-data that needs to b

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Pierre ANCELOT
Okay, thank you :) On Tue, May 18, 2010 at 2:48 PM, Brian Bockelman wrote: > > On May 18, 2010, at 7:38 AM, Pierre ANCELOT wrote: > > > Hi, thanks for this fast answer :) > > If so, what do you mean by blocks? If a file has to be splitted, it will > be > > splitted when larger than 64MB? > > > >

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Brian Bockelman
On May 18, 2010, at 7:38 AM, Pierre ANCELOT wrote: > Hi, thanks for this fast answer :) > If so, what do you mean by blocks? If a file has to be splitted, it will be > splitted when larger than 64MB? > For every 64MB of the file, Hadoop will create a separate block. So, if you have a 32KB fil

Re: Data node decommission doesn't seem to be working correctly

2010-05-18 Thread Brian Bockelman
Hey Scott, Hadoop tends to get confused by nodes with multiple hostnames or multiple IP addresses. Is this your case? I can't remember precisely what our admin does, but I think he puts in the IP address which Hadoop listens on in the exclude-hosts file. Look in the output of hadoop dfsadmi

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Pierre ANCELOT
... and by slices of 64MB then I mean... ? On Tue, May 18, 2010 at 2:38 PM, Pierre ANCELOT wrote: > Hi, thanks for this fast answer :) > If so, what do you mean by blocks? If a file has to be splitted, it will be > splitted when larger than 64MB? > > > > > > On Tue, May 18, 2010 at 2:34 PM, Bria

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Pierre ANCELOT
Hi, thanks for this fast answer :) If so, what do you mean by blocks? If a file has to be splitted, it will be splitted when larger than 64MB? On Tue, May 18, 2010 at 2:34 PM, Brian Bockelman wrote: > Hey Pierre, > > These are not traditional filesystem blocks - if you save a file smaller > th

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Brian Bockelman
Hey Pierre, These are not traditional filesystem blocks - if you save a file smaller than 64MB, you don't lose 64MB of file space.. Hadoop will use 32KB to store a 32KB file (ok, plus a KB of metadata or so), not 64MB. Brian On May 18, 2010, at 7:06 AM, Pierre ANCELOT wrote: > Hi, > I'm port

Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Pierre ANCELOT
Hi, I'm porting a legacy application to hadoop and it uses a bunch of small files. I'm aware that having such small files ain't a good idea but I'm not doing the technical decisions and the port has to be done for yesterday... Of course such small files are a problem, loading 64MB blocks for a few

Hadoop User Group UK Meetup - June 3rd

2010-05-18 Thread Klaas Bosteels
Hi all, I've picked up where Johan left off with the HUGUK meetups and the next one is planned for June 3rd. The main talks will be: “Introduction to Sqoop” by Aaron Kimball (Cloudera) “Hive at Last.fm” by Tim Sell (Last.fm) More details are available at: http://dumbotics.com/2010/05/18/huguk-4