I have a 0.20.2 cluster. I notice that our nodes with 2 TB disks waste
tons of disk io doing a 'du -sk' of each data directory. Instead of
'du -sk' why not just do this with java.io.file? How is this going to
work with 4TB 8TB disks and up ? It seems like calculating used and
free disk space could
How many files do you have per node? What i find is that most of my
inodes/dentries are almost always cached so calculating the 'du -sk' on a
host even with hundreds of thousands of files the du -sk generally uses high
i/o for a couple of seconds. I am using 2TB disks too.
Sridhar
On Fri, Apr
BTW this is on systems which have a lot of RAM and aren't under high load.
If you find that your system is evicting dentries/inodes from its cache, you
might want to experiment with drop vm.vfs_cache_pressure from its default so
that the they are preferred over the pagecache. At the extreme, setti
On Fri, Apr 8, 2011 at 12:24 PM, sridhar basam wrote:
>
> BTW this is on systems which have a lot of RAM and aren't under high load.
> If you find that your system is evicting dentries/inodes from its cache, you
> might want to experiment with drop vm.vfs_cache_pressure from its default so
> that
On Fri, Apr 8, 2011 at 1:59 PM, Edward Capriolo wrote:
>
> Right. Most inodes are always cached when:
>
> 1) small disks
> 2) light load.
>
But that is not the case with hadoop.
>
> Making the problem worse:
> It seems like hadoop seems to issues 'du -sk' for all disks at the
> same time. This pu