Block size are typically 64 M or 12 M, so in your case only a single block
is involved which means if you have a single replica then only a single
data node will be used. The default replication is three and since you only
have two data nodes, you will most likely have two copies of the data in
two separate data nodes.


On Sun, May 25, 2014 at 12:40 PM, Sindhu Hosamane <sindh...@gmail.com>wrote:

>
> Hello Friends,
>
> I am running  multiple datanodes on a single machine .
>
> The output of jps command shows
> Namenode       Datanode     Datanode     Jobtracker     tasktracker
>  Secondary Namenode
>
> Which assures that 2 datanodes are up and running .I execute cascalog
> queries on this 2 datanode hadoop cluster  , And i get the results of query
> too.
> I am not sure if it is really using both datanodes . ( bcoz anyways i get
> results with one datanode )
>
> (read somewhere about HDFS storing data in datanodes like below )
> 1)  A HDFS scheme might automatically move data from one DataNode to
> another if the free space on a DataNode falls below a certain threshold.
> 2)  Internally, a file is split into one or more blocks and these blocks
> are stored in a set of DataNodes.
>
> My doubts are :
> * Do i have to make any configuration changes in hadoop to tell it to
> share datablocks between 2 datanodes or does it do automatically .
> * Also My test data is not too big . its only 240 KB . According to point
> 1) i don't know if such small test data can initiate automatic movement of
>  data from one datanode to another .
> * Also what should dfs.replication  value be when i am running 2 datanodes
>  ?  (i guess its 2 )
>
>
> Any advice or help would be very much appreciated .
>
> Best Regards,
> Sindhu
>
>
>
>
>

Reply via email to