>> Hello Friends,
>>
>> I am running multiple datanodes on a single machine .
>>
>> The output of jps command shows
>> Namenode Datanode Datanode Jobtracker tasktracker
>> Secondary Namenode
>>
>> Which assures that 2 datanodes are up and running .I execute cascalog
>> queries on this 2 datanode hadoop cluster , And i get the results of query
>> too.
>> I am not sure if it is really using both datanodes . ( bcoz anyways i get
>> results with one datanode )
>>
>> (read somewhere about HDFS storing data in datanodes like below )
>> 1) A HDFS scheme might automatically move data from one DataNode to another
>> if the free space on a DataNode falls below a certain threshold.
>> 2) Internally, a file is split into one or more blocks and these blocks are
>> stored in a set of DataNodes.
>>
>> My doubts are :
>> * Do i have to make any configuration changes in hadoop to tell it to share
>> datablocks between 2 datanodes or does it do automatically .
>> * Also My test data is not too big . its only 240 KB . According to point 1)
>> i don't know if such small test data can initiate automatic movement of
>> data from one datanode to another .
>> * Also what should dfs.replication value be when i am running 2 datanodes
>> ? (i guess its 2 )
>>
>>
>> Any advice or help would be very much appreciated .
>>
>> Best Regards,
>> Sindhu
>
>