Re: Reading from File

2011-04-27 Thread Harsh J
Hello Mark, On Wed, Apr 27, 2011 at 12:19 AM, Mark question markq2...@gmail.com wrote: Hi,   My mapper opens a file and read records using next() . However, I want to stop reading if there is no memory available. What confuses me here is that even though I'm reading record by record with

Unsplittable files on HDFS

2011-04-27 Thread Niels Basjes
Hi, In some scenarios you have gzipped files as input for your map reduce job (apache logfiles is a common example). Now some of those files are several hundred megabytes and as such will be split by HDFS in several blocks. When looking at a real 116MiB file on HDFS I see this (4 nodes,

Re: Unsplittable files on HDFS

2011-04-27 Thread Harsh J
Hey Niels, The block size is a per-file property. Would putting/creating these gzip files on the DFS with a very high block size (such that it doesn't split across for such files) be a valid solution to your problem here? On Wed, Apr 27, 2011 at 1:25 PM, Niels Basjes ni...@basjes.nl wrote: Hi,

Re: Unsplittable files on HDFS

2011-04-27 Thread Niels Basjes
Hi, I did the following with a 1.6GB file hadoop fs -Ddfs.block.size=2147483648 -put /home/nbasjes/access-2010-11-29.log.gz /user/nbasjes and I got Total number of blocks: 1 4189183682512190568:10.10.138.61:50010 10.10.138.62:50010 Yes, that does the trick. Thank

Re: Execution time.

2011-04-27 Thread Steve Loughran
On 26/04/11 14:16, real great.. wrote: Thanks a lot.I have managed to do it. And my final year project is on power aware Hadoop. i do realise its against ethics to get the code that way..:) Good. What do you mean by power aware -awareness of the topology of UPS sources inside a datacentre

Re: Cluster hardware question

2011-04-27 Thread Steve Loughran
On 26/04/11 14:55, Xiaobo Gu wrote: Hi, People say a balanced server configration is as following: 2 4 Core CPU, 24G RAM, 4 1TB SATA Disks But we have been used to use storages servers with 24 1T SATA Disks, we are wondering will Hadoop be CPU bounded if this kind of servers are used.

Re: Unsplittable files on HDFS

2011-04-27 Thread Steve Loughran
On 27/04/11 10:48, Niels Basjes wrote: Hi, I did the following with a 1.6GB file hadoop fs -Ddfs.block.size=2147483648 -put /home/nbasjes/access-2010-11-29.log.gz /user/nbasjes and I got Total number of blocks: 1 4189183682512190568:10.10.138.61:50010

Get the actual line number from inputformat in the mapper

2011-04-27 Thread Pei HE
Hi, I want to know how to get the actual line number of the input file in the mapper. The key, which TextInputFormat generates, is the bytes offset in the file. So, how can I find the global line offset in the mapper? Thanks - - Pei

Running C hdfs Code in Hadoop

2011-04-27 Thread Adarsh Sharma
Dear all, Today I am trying to run a simple code by following the below tutorial :- http://hadoop.apache.org/hdfs/docs/current/libhdfs.html I followed the below steps :- 1. Set LD_LIBRARY_PATH CLASSPATH as : export

Re: Get the actual line number from inputformat in the mapper

2011-04-27 Thread Harsh J
Hello Pei, On Thu, Apr 28, 2011 at 6:58 AM, Pei HE pei...@gmail.com wrote: The key, which TextInputFormat generates, is the bytes offset in the file. So, how can I find the global line offset in the mapper? This is not achievable unless you have fixed byte records (in which case you should be

Namenode error :- FSNamesystem initialization failed

2011-04-27 Thread Adarsh Sharma
Dear all, I have a running 4-node Hadoop cluster and some data stored in HDFS. Today by mistake,I start the hadoop cluster with root user. rootbin/start-all.sh After correcting my mistake , when I try to run with the hadoop user, my Namenode fails with the below exception\ : STARTUP_MSG: