RE: High Availability - second namenode (master2) issue: Incompatible namespaceIDs

2012-11-16 Thread Vinayakumar B
Hi, If you are moving from NonHA (single master) to HA, then follow the below steps. 1. Configure the another namenode's configuration in the running namenode and all datanode's configurations. And configure logical fs.defaultFS 2. Configure the shared storage related

Re: distributed cache

2012-11-16 Thread Yanbo Liang
As far as I know, The local.cache.size parameter controls the size of the DistributedCache. By default, it’s set to 10 GB. And the parameter io.sort.mb is not used here, it used as each map task has a circular memory buffer that it writes the output to. 2012/11/16 yingnan.ma

Re: High Availability - second namenode (master2) issue: Incompatible namespaceIDs

2012-11-16 Thread a...@hsk.hk
Thank you very much, will try. On 16 Nov 2012, at 4:31 PM, Vinayakumar B wrote: Hi, If you are moving from NonHA (single master) to HA, then follow the below steps. 1. Configure the another namenode’s configuration in the running namenode and all datanode’s configurations. And

Re: High Availability - second namenode (master2) issue: Incompatible namespaceIDs

2012-11-16 Thread Suresh Srinivas
Vinay, if the Hadoop docs are not clear in this regard, can you please create a jira to add these details? On Fri, Nov 16, 2012 at 12:31 AM, Vinayakumar B vinayakuma...@huawei.comwrote: Hi, ** ** If you are moving from NonHA (single master) to HA, then follow the below steps.

RE: High Availability - second namenode (master2) issue: Incompatible namespaceIDs

2012-11-16 Thread Kartashov, Andy
Agreed here. Whenever you have id disagreement between NN and DN. Simply, delete all the entries in your df/data directory and restart DN. No need to reformat NN. Rgds, AK47 From: shashwat shriparv [mailto:dwivedishash...@gmail.com] Sent: Friday, November 16, 2012 2:53 AM To:

RE: High Availability - second namenode (master2) issue: Incompatible namespaceIDs

2012-11-16 Thread Kartashov, Andy
Vinay, Two questions. 1. Configure the another namenode's configuration. What exactly to configure. 2. What is zkfs? From: Vinayakumar B [mailto:vinayakuma...@huawei.com] Sent: Friday, November 16, 2012 3:31 AM To: user@hadoop.apache.org Subject: RE: High

HDFS block size

2012-11-16 Thread Pankaj Gupta
Hi, I apologize for asking a question that has probably been discussed many times before, but I just want to be sure I understand it correctly. My question is regarding the advantages of large block size in HDFS. The Hadoop Definitive Guide provides comparison with regular file systems and

Re: HDFS block size

2012-11-16 Thread Andy Isaacson
On Fri, Nov 16, 2012 at 10:55 AM, Pankaj Gupta pan...@brightroll.com wrote: The Hadoop Definitive Guide provides comparison with regular file systems and indicates the advantage being lower number of seeks(as far as I understood it, may be I read it incorreclty, if so I apologize). But, as I

Re: Pydoop 0.7.0-rc1 released

2012-11-16 Thread Bart Verwilst
Hi Simone, I was wondering, is it possible to write AVRO files to hadoop straight from your lib ( mixed with avro libs ofcourse )? I'm currently trying to come up with a way to read from mysql ( but more complicated than sqoop can handle ) and write it out to avro files on HDFS. Is something

Re: notorious ClassNotFoundError

2012-11-16 Thread Joey Krabacher
External libs must exist on all task nodes, in case you haven't already done that. /* Joey */ On Fri, Nov 16, 2012 at 3:13 PM, Kartashov, Andy andy.kartas...@mpac.cawrote: Guys, The notorious error. Used all possible clues to resolve this. Ran sqoop --import at command line

RE: notorious ClassNotFoundError

2012-11-16 Thread Kartashov, Andy
I am running MR in pseudo distributed mode - so it is only one Node involced. From: Joey Krabacher [mailto:jkrabac...@gmail.com] Sent: Friday, November 16, 2012 4:16 PM To: user@hadoop.apache.org Subject: Re: notorious ClassNotFoundError External libs must exist on all task nodes, in case you

Re: HDFS block size

2012-11-16 Thread Pankaj Gupta
Thanks for the explanation. Sounds like the seek performance is faster because reading one large file on the filesystem is faster than reading many small files; that makes sense. On Fri, Nov 16, 2012 at 11:53 AM, Andy Isaacson a...@cloudera.com wrote: On Fri, Nov 16, 2012 at 10:55 AM, Pankaj

Re: HDFS block size

2012-11-16 Thread Pankaj Gupta
Thanks for the explanation and showing a different perspective. On Fri, Nov 16, 2012 at 12:09 PM, Ted Dunning tdunn...@maprtech.com wrote: Andy's points are reasonable but there are a few omissions, - modern file systems are pretty good at writing large files into contiguous blocks if they

Re: Social media data

2012-11-16 Thread Mahesh Balija
Hi Prabhu, For Twitter there are different types for obtaining feeds like gardenhose and FireHose etc. Some may be free and some are paid, like that you can look for other social media options. Best, Mahesh Balija, Calsoft Labs. On Thu, Nov 15, 2012 at 11:35 PM,

SafeModeException on starting up

2012-11-16 Thread Li Li
hi all, I am trying to set up a hadoop cluster. But when I use start-all.sh to start it. it throws exception: 2012-11-17 10:40:21,662 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:work cause:org.apache.hadoop.hdfs.server.namenode.SafeModeException:

Re: Optimizing Disk I/O - does HDFS do anything ?

2012-11-16 Thread Scott Carey
Ext3 can be quite atrocious when it comes to fragmentation. Simply start with an empty drive, and have 8 threads each concurrently write to their own large file sequentially. ext4 is much better in this regard. xfs is not as good at initial placement, but has an online defragmenter. ext4 is