> ok .thanks for that information .
> As i said i am running 2 datanodes on same machine . so my haddop home has 2
> conf folders .
> conf and conf2 and in turn 2 hdfs-site.xml in both conf folders .
> I guess dfs.replication value in hdfs-site.xml of conf folder should be 3 .
> What should i have it in conf2 ? should it be 1 there ?
>
> sorry if question sounds stupid . But i unfamiliar with these kind of
> settings ( 2 datanodes on same machine ..so having 2 conf )
>
>
> If data is split across multiple datanodes , then processing capacity would
> be improved - ( thats what i guess ) since my file is only 240 KB , it
> occupies only one block . It cannot use second block and remain in another
> datanode .
> So now , does it make sense to reduce the block size so that blocks are split
> between 2 datanodes —if i want to take very much advantage of multiple
> datanodes .
>
>
> Best Regards,
> Sindhu
>
>
> On 25 May 2014, at 21:47, Peyman Mohajerian <mohaj...@gmail.com> wrote:
>
>> Block size are typically 64 M or 12 M, so in your case only a single block
>> is involved which means if you have a single replica then only a single data
>> node will be used. The default replication is three and since you only have
>> two data nodes, you will most likely have two copies of the data in two
>> separate data nodes.
>>
>>
>> On Sun, May 25, 2014 at 12:40 PM, Sindhu Hosamane <sindh...@gmail.com> wrote:
>>
>>>> Hello Friends,
>>>>
>>>> I am running multiple datanodes on a single machine .
>>>>
>>>> The output of jps command shows
>>>> Namenode Datanode Datanode Jobtracker tasktracker
>>>> Secondary Namenode
>>>>
>>>> Which assures that 2 datanodes are up and running .I execute cascalog
>>>> queries on this 2 datanode hadoop cluster , And i get the results of
>>>> query too.
>>>> I am not sure if it is really using both datanodes . ( bcoz anyways i get
>>>> results with one datanode )
>>>>
>>>> (read somewhere about HDFS storing data in datanodes like below )
>>>> 1) A HDFS scheme might automatically move data from one DataNode to
>>>> another if the free space on a DataNode falls below a certain threshold.
>>>> 2) Internally, a file is split into one or more blocks and these blocks
>>>> are stored in a set of DataNodes.
>>>>
>>>> My doubts are :
>>>> * Do i have to make any configuration changes in hadoop to tell it to
>>>> share datablocks between 2 datanodes or does it do automatically .
>>>> * Also My test data is not too big . its only 240 KB . According to point
>>>> 1) i don't know if such small test data can initiate automatic movement of
>>>> data from one datanode to another .
>>>> * Also what should dfs.replication value be when i am running 2 datanodes
>>>> ? (i guess its 2 )
>>>>
>>>>
>>>> Any advice or help would be very much appreciated .
>>>>
>>>> Best Regards,
>>>> Sindhu
>>>
>>>
>>
>>
>