Re: How to make sure data blocks are shared between 2 datanodes

Sindhu Hosamane Mon, 26 May 2014 11:19:29 -0700
> ok .thanks for that information . 
> As i said i am running  2 datanodes on same machine . so my haddop home has 2 
> conf folders .
> conf and conf2  and in turn 2 hdfs-site.xml in both conf folders .
> I guess dfs.replication value in hdfs-site.xml of conf folder should be 3 .
> What should i have it in conf2  ? should it be 1 there ?
> 
> sorry if question sounds stupid . But i unfamiliar with these kind of 
> settings ( 2 datanodes on same machine ..so having 2 conf )
> 
> 
>  If data is split across multiple datanodes , then processing capacity would 
> be improved - ( thats what i guess ) since my file is only 240 KB , it 
> occupies only one block . It cannot use second block and remain in another 
> datanode . 
> So now , does it make sense to reduce the block size so that blocks are split 
> between 2 datanodes —if i want to take very much advantage of multiple 
> datanodes .
> 
> 
> Best Regards,
> Sindhu
> 
> 
> On 25 May 2014, at 21:47, Peyman Mohajerian <mohaj...@gmail.com> wrote:
> 
>> Block size are typically 64 M or 12 M, so in your case only a single block 
>> is involved which means if you have a single replica then only a single data 
>> node will be used. The default replication is three and since you only have 
>> two data nodes, you will most likely have two copies of the data in two 
>> separate data nodes.
>> 
>> 
>> On Sun, May 25, 2014 at 12:40 PM, Sindhu Hosamane <sindh...@gmail.com> wrote:
>> 
>>>> Hello Friends, 
>>>> 
>>>> I am running  multiple datanodes on a single machine .
>>>> 
>>>> The output of jps command shows 
>>>> Namenode       Datanode     Datanode     Jobtracker     tasktracker        
>>>> Secondary Namenode
>>>> 
>>>> Which assures that 2 datanodes are up and running .I execute cascalog 
>>>> queries on this 2 datanode hadoop cluster  , And i get the results of 
>>>> query too.
>>>> I am not sure if it is really using both datanodes . ( bcoz anyways i get 
>>>> results with one datanode )
>>>> 
>>>> (read somewhere about HDFS storing data in datanodes like below )
>>>> 1)  A HDFS scheme might automatically move data from one DataNode to 
>>>> another if the free space on a DataNode falls below a certain threshold. 
>>>> 2)  Internally, a file is split into one or more blocks and these blocks 
>>>> are stored in a set of DataNodes. 
>>>> 
>>>> My doubts are :
>>>> * Do i have to make any configuration changes in hadoop to tell it to 
>>>> share datablocks between 2 datanodes or does it do automatically .
>>>> * Also My test data is not too big . its only 240 KB . According to point 
>>>> 1) i don't know if such small test data can initiate automatic movement of 
>>>>  data from one datanode to another .
>>>> * Also what should dfs.replication  value be when i am running 2 datanodes 
>>>>  ?  (i guess its 2 )
>>>> 
>>>> 
>>>> Any advice or help would be very much appreciated .
>>>> 
>>>> Best Regards,
>>>> Sindhu
>>> 
>>> 
>> 
>> 
>
Re: How to make sure data blocks are shared between 2 datanodes

Reply via email to