Giuseppe mentioned a contributing factor is HDP ups the cache size from 256
to 4096.


On Fri, Jun 27, 2014 at 2:31 PM, lars hofhansl <la...@apache.org> wrote:

> Sounds scary!
> HDP 2.1 ships with Hadoop 2.4.0, right?
>
>
> Sounds like a serious HDFS bug with SSR.
> Maybe we need to at least document this behavior and recommend settings in
> the HBase book.
>
> -- Lars
>
>
> ----- Original Message -----
> From: Giuseppe Reina <g.re...@gmail.com>
> To: user@hbase.apache.org
> Cc:
> Sent: Wednesday, June 25, 2014 2:54 AM
> Subject: Disk space leak when using HBase and HDFS ShortCircuit
>
> Hi all,
>    we have been experiencing the same problem with 2 of our clusters. We
> are currently using HDP 2.1 that comes with HBase 0.98.
>
> The problem manifested by showing a huge differences (hundreds of GB)
> between the output of "df" and "du" of the hdfs data directories.
> Eventually, other systems complained for the lack of space before shutting
> down. We identified the problem and discovered that all the RegionServers
> were holding lots of open file descriptors to deleted files, which
> prevented the OS to free the disk space occupied (hence the difference
> between "df" and "du"). The deleted files were pointing to the local HDFS
> blocks of old HFiles deleted from HDFS during the compaction and/or split
> operations. Apparently those file descriptors were stored by the HDFS
> ShortCircuit cache.
>
> My question is, isn't the shortcircuit feautre supposed to get "notified"
> somehow of file deletion on a file on HDFS so it can remove the open fds
> from the cache? This creates huge leaks whenever HBase is heavily loaded
> and we had to restart the RegionServer periodically until before
> identifying the problem. We solved the problem first by disabling
> shortcircuit from HDFS and then enabling it and reducing the cache size so
> to trigger often the caching policies (this leads to some performance
> loss).
>
>
> p.s. I am aware of the "
> dfs.client.read.shortcircuit.streams.cache.expiry.ms
> " directoparameter, but for some reason the default value (5 mins) does not
> work out-of-the-box on HDP 2.1, moreover the problem persists for high
> timeouts and big cache sizes.
>
> Kind Regards
>
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Reply via email to