How can I add a new hard disk in an existing HDFS cluster?

2013-05-03 Thread Joarder KAMAL
Hi, I have a running HDFS cluster (Hadoop/HBase) consists of 4 nodes and the initial hard disk (/dev/vda1) size is 10G only. Now I have a second hard drive /dev/vdb of 60GB size and want to add it into my existing HDFS cluster. How can I format the new hard disk (and in which format? XFS?) and

Parallel Load Data into Two partitions of a Hive Table

2013-05-03 Thread selva
Hi All, I need to load a month worth of processed data into a hive table. Table have 10 partitions. Each day have many files to load and each file is taking two seconds(constantly) and i have ~3000 files). So it will take days to complete for 30 days worth of data. I planned to load every day

Which data sets were processed by each tasktracker?

2013-05-03 Thread Agarwal, Nikhil
Hi, I have a 3-node cluster, with JobTracker running on one machine and TaskTrackers on other two. Instead of using HDFS, I have written my own FileSystem implementation. I am able to run a MapReduce job on this cluster but I am not able to make out from logs or TaskTracker UI, which data

Re: How can I add a new hard disk in an existing HDFS cluster?

2013-05-03 Thread Geelong Yao
you can change the setting of data.dfs.dir in hdfs-site.xml if your version is 1.x property namedata.dfs.dir/name value/usr/hadoop/tmp/dfs/data, /dev/vdb /value /property 2013/5/3 Joarder KAMAL joard...@gmail.com Hi, I have a running HDFS cluster (Hadoop/HBase) consists

Re: Which data sets were processed by each tasktracker?

2013-05-03 Thread Harsh J
You probably need to be using a release that has https://issues.apache.org/jira/browse/MAPREDUCE-3678 in it. It will print the input split onto the task logs, letting you know therefore what it processed at all times (so long as the input split type, such as file splits, have intelligible outputs

Re: Parallel Load Data into Two partitions of a Hive Table

2013-05-03 Thread Yanbo Liang
load data to different partitions parallel is OK, because it equivalent to write to different file on HDFS 2013/5/3 selva selvai...@gmail.com Hi All, I need to load a month worth of processed data into a hive table. Table have 10 partitions. Each day have many files to load and each file is

Re: Parallel Load Data into Two partitions of a Hive Table

2013-05-03 Thread selva
Thanks Yanbo. I my doubt is got clarified now. On Fri, May 3, 2013 at 2:38 PM, Yanbo Liang yanboha...@gmail.com wrote: load data to different partitions parallel is OK, because it equivalent to write to different file on HDFS 2013/5/3 selva selvai...@gmail.com Hi All, I need to load a

Re: How can I add a new hard disk in an existing HDFS cluster?

2013-05-03 Thread Håvard Wahl Kongsgård
go for ext3 or ext4 On Fri, May 3, 2013 at 8:32 AM, Joarder KAMAL joard...@gmail.com wrote: Hi, I have a running HDFS cluster (Hadoop/HBase) consists of 4 nodes and the initial hard disk (/dev/vda1) size is 10G only. Now I have a second hard drive /dev/vdb of 60GB size and want to add it

Re: Where's downlink.data file?

2013-05-03 Thread Xun TANG
Anyone has similar experience? Any suggestion welcome. I am stuck here for a week now... Thanks, Alice On Sun, Apr 28, 2013 at 5:38 PM, Xun TANG tangxun.al...@gmail.com wrote: According to this link http://wiki.apache.org/hadoop/HowToDebugMapReducePrograms I am trying to find out where the

pseudo distributed mode

2013-05-03 Thread mouna laroussi
Hi, I want to configure my Hadoop in tne pseudo distributed mode. when i arrive to the step to format namenode, i foind at the web page 50070 there are no namenode in the cluster. what shouled i do? is there any path to change? Thanks -- LAROUSSI Mouna Élève ingénieur en Génie Logiciel - INSAT

Re: pseudo distributed mode

2013-05-03 Thread Nitin Pawar
once you format the namenode, it will need to started again for the normal purpose usage On Fri, May 3, 2013 at 12:45 PM, mouna laroussi mouna.larou...@gmail.comwrote: Hi, I want to configure my Hadoop in tne pseudo distributed mode. when i arrive to the step to format namenode, i foind at

Re: pseudo distributed mode

2013-05-03 Thread Mohammad Tariq
After formatting the NN, start the daemons using bin/start-hdfs.sh and bin/start-mapred.sh. If it still doesn't work show us the logs. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Fri, May 3, 2013 at 10:29 PM, Nitin Pawar nitinpawar...@gmail.comwrote: once you format

Re: pseudo distributed mode

2013-05-03 Thread Roman Shaposhnik
On Fri, May 3, 2013 at 12:15 AM, mouna laroussi mouna.larou...@gmail.com wrote: Hi, I want to configure my Hadoop in tne pseudo distributed mode. when i arrive to the step to format namenode, i foind at the web page 50070 there are no namenode in the cluster. what shouled i do? is there any