> Hello Friends,
>
> I am running multiple datanodes on a single machine .
>
> The output of jps command shows
> Namenode Datanode Datanode Jobtracker tasktracker
> Secondary Namenode
>
> Which assures that 2 datanodes are up and running .I execute cascalog queries
> on this 2 datanode hadoop cluster , And i get the results of query too.
> I am not sure if it is really using both datanodes . ( bcoz anyways i get
> results with one datanode )
>
> (read somewhere about HDFS storing data in datanodes like below )
> 1) A HDFS scheme might automatically move data from one DataNode to another
> if the free space on a DataNode falls below a certain threshold.
> 2) Internally, a file is split into one or more blocks and these blocks are
> stored in a set of DataNodes.
>
> My doubts are :
> * Do i have to make any configuration changes in hadoop to tell it to share
> datablocks between 2 datanodes or does it do automatically .
> * Also My test data is not too big . its only 240 KB . According to point 1)
> i don't know if such small test data can initiate automatic movement of data
> from one datanode to another .
> * Also what should dfs.replication value be when i am running 2 datanodes ?
> (i guess its 2 )
>
>
> Any advice or help would be very much appreciated .
>
> Best Regards,
> Sindhu