Block size are typically 64 M or 12 M, so in your case only a single block is involved which means if you have a single replica then only a single data node will be used. The default replication is three and since you only have two data nodes, you will most likely have two copies of the data in two separate data nodes.
On Sun, May 25, 2014 at 12:40 PM, Sindhu Hosamane <sindh...@gmail.com>wrote: > > Hello Friends, > > I am running multiple datanodes on a single machine . > > The output of jps command shows > Namenode Datanode Datanode Jobtracker tasktracker > Secondary Namenode > > Which assures that 2 datanodes are up and running .I execute cascalog > queries on this 2 datanode hadoop cluster , And i get the results of query > too. > I am not sure if it is really using both datanodes . ( bcoz anyways i get > results with one datanode ) > > (read somewhere about HDFS storing data in datanodes like below ) > 1) A HDFS scheme might automatically move data from one DataNode to > another if the free space on a DataNode falls below a certain threshold. > 2) Internally, a file is split into one or more blocks and these blocks > are stored in a set of DataNodes. > > My doubts are : > * Do i have to make any configuration changes in hadoop to tell it to > share datablocks between 2 datanodes or does it do automatically . > * Also My test data is not too big . its only 240 KB . According to point > 1) i don't know if such small test data can initiate automatic movement of > data from one datanode to another . > * Also what should dfs.replication value be when i am running 2 datanodes > ? (i guess its 2 ) > > > Any advice or help would be very much appreciated . > > Best Regards, > Sindhu > > > > >