measure throughput of cluster

2011-05-03 Thread Rita
I am trying to acquire statistics about my hdfs cluster in the lab. One stat I am really interested in is the total throughput (gigabytes served) of the cluster for 24 hours. I suppose I can look for 'cmd=open' in the log file of the name node but how accurate is it? It seems there is no 'cmd=clos

Re: measure throughput of cluster

2011-05-03 Thread Brian Bockelman
Hi Rita, An open file in HDFS doesn't take up any resources in the NN, so there is no corresponding close operation. Probably you want to increase the logging in the datanodes, which will print out activity per client. Brian On May 3, 2011, at 6:58 AM, Rita wrote: > I am trying to acquire st