Re: Adding hard-disks to an existing HDFS cluster

2010-03-01 Thread Steve Loughran
Eli Collins wrote: I presume it makes no sense to try to spread the NameNode across multiple disks? Not quite sure what you mean here, but dfs.name.dir (where the NN stores its metadata) should have multiple directories on different disks to guard against the failure of any single disk. Many

Re: testing if replication working

2010-03-01 Thread Mag Gam
Thanks for your responses. On Mon, Mar 1, 2010 at 2:48 AM, Terrence Martin tmar...@physics.ucsd.edu wrote: Mag Gam wrote: I just setup my first hadoop cluster with 5 nodes. What is the best way to check if replication is really working? I assume the best way is to power down 2 nodes and

Re: Adding hard-disks to an existing HDFS cluster

2010-03-01 Thread Marc Farnum Rendino
On Sun, Feb 28, 2010 at 5:27 PM, Eli Collins e...@cloudera.com wrote: dfs.name.dir (where the NN stores its metadata) should have multiple directories on different disks to guard against the failure of any single disk. Many people also use RAIDed disks and include an NFS mount in dfs.name.dir

Re: Adding hard-disks to an existing HDFS cluster

2010-03-01 Thread Marc Farnum Rendino
On Mon, Mar 1, 2010 at 5:48 AM, Steve Loughran ste...@apache.org wrote: Best of all: a secondary namenode to get the streamed event log, as that will mean your cluster restarts faster. You do not want to lose your NN data. If the NN data is lost, all the HDFS data is functionally lost,

Re: testing if replication working

2010-03-01 Thread Allen Wittenauer
On 2/28/10 11:48 PM, Terrence Martin tmar...@physics.ucsd.edu wrote: Mag Gam wrote: I just setup my first hadoop cluster with 5 nodes. What is the best way to check if replication is really working? I assume the best way is to power down 2 nodes and see if I can still reach my data? Well

Re: Adding hard-disks to an existing HDFS cluster

2010-03-01 Thread Eli Collins
On Mon, Mar 1, 2010 at 8:19 AM, Marc Farnum Rendino mvg...@gmail.com wrote: On Sun, Feb 28, 2010 at 5:27 PM, Eli Collins e...@cloudera.com wrote: dfs.name.dir (where the NN stores its metadata) should have multiple directories on different disks to guard against the failure of any single

Re: Hive User Group Meeting 3/18/2010 7pm at Facebook

2010-03-01 Thread Zheng Shao
We also created a Meetup group in case you prefer to register on meetup.com http://www.meetup.com/Hive-User-Group-Meeting/calendar/12741356/ We are hosting a Hive User Group Meeting, open to all current and potential hadoop/hive users. Agenda: * Hive Tutorial (Carl Steinbach, cloudera): 20 min

Avro: 1.3.0 available

2010-03-01 Thread Doug Cutting
Avro 1.3.0 is now available. In this release: - the Avro file format has been revised and simplified; - the source tree and release artifacts have been restructured; - the Java port has been significantly optimized; - the Python port has been largely rewritten; - the C port is now

Hadoop 0.20.2 is released

2010-03-01 Thread Chris Douglas
Hadoop 0.20.2 is released and propagating to mirrors. The list of changes may be found in the release notes: http://bit.ly/dd6rwL And in JIRA: COMMON: http://bit.ly/9fDo2f MAPREDUCE: http://bit.ly/aD0BM5 HDFS: http://bit.ly/ckvMfO Thanks to everyone who worked on this release. -C

Re: Adding hard-disks to an existing HDFS cluster

2010-03-01 Thread Marc Farnum Rendino
On Mon, Mar 1, 2010 at 2:00 PM, Eli Collins e...@cloudera.com wrote: Yes, it's good to have multiple directories as well as may each or at least some of the directories reliable, eg below /data/N/dfs/namenode are local disks and /mnt/filer-hdfs is a reliable NFS filer. namedfs.name.dir/name

Re: Adding hard-disks to an existing HDFS cluster

2010-03-01 Thread Allen Wittenauer
Marc, You might find my preso I did on Hadoop at Apachecon EU last year handy: http://wiki.apache.org/hadoop/HadoopPresentations?action=AttachFiledo=view; target=aw-apachecon-eu-2009.pdf aka http://bit.ly/d3UU4A It talks a bit about the care and feeding of your Hadoop grid, including how to

Re: Adding hard-disks to an existing HDFS cluster

2010-03-01 Thread Marc Farnum Rendino
On Mon, Mar 1, 2010 at 7:20 PM, Allen Wittenauer awittena...@linkedin.comwrote: You might find my preso I did on Hadoop at Apachecon EU last year handy... Terrific; thanks! - Marc

Re: Adding hard-disks to an existing HDFS cluster

2010-03-01 Thread Eli Collins
There's work on trunk to add a backup name node which gets a stream of edits from the NN so it has an update copy of the metadata. Ah; I think this is alluded to in the conf file, right? It's dfs.namenode.backup.* in conf files but you probably haven't bumped into them unless you're using