Re: tservers running out of heap space

Eric Newton Wed, 05 Dec 2012 09:11:03 -0800

Some options:

* Decrease table.bloom.size
* Increase table.bloom.error.rate
* Decrease the number of files that can be opened at once
* Increase the size of your JVM (may require more hardware :-)


Can you tell us a little more about the column family and the size of BF
that you are getting?  o.a.a.core.file.rfile.PrintInfo can get you the size
of the the bloom filter in a file.

-Eric




On Wed, Dec 5, 2012 at 10:55 AM, Anthony Fox <[email protected]> wrote:

> So, after removing the bloom filter, I get no OOMs with multiple scanners
> but my column family only scans are quite slow.  Is there any settings you
> can recommend to enable the CF bloom filters that won't cause OOMs?
>
> Thanks,
> Anthony
>
>
> On Thu, Nov 29, 2012 at 3:50 PM, Anthony Fox <[email protected]> wrote:
>
>> Ok, a bit more info.  I set -XX:+HeapDumpOnOutOfMemoryError and took a
>> look at the heap dump.  The thread that caused the OOM is reading a column
>> family bloom filter from the CacheableBlockFile.  The class taking up the
>> memory is long[] which seems to be consistent with a bloom filter.  Does
>> this sound right?  Any guidance on settings to tweak related to bloom
>> filters to alleviate this issue?
>>
>>
>> On Thu, Nov 29, 2012 at 2:24 PM, Anthony Fox <[email protected]>wrote:
>>
>>> Since the scan involves an intersecting iterator, it has to scan the
>>> entire row range.  Also, it's not even very many concurrent clients -
>>> between 5 and 10.  Should I turn compression off on this table or is that
>>> bad idea in general?
>>>
>>>
>>> On Thu, Nov 29, 2012 at 2:22 PM, Keith Turner <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Nov 29, 2012 at 2:09 PM, Anthony Fox <[email protected]>wrote:
>>>>
>>>>> We're not on 1.4 yet, unfortunately.  Are there any config params I
>>>>> can tweak to manipulate the compressor pool?
>>>>
>>>>
>>>> Not that I know of, but its been a while since I looked at that.
>>>>
>>>>
>>>>>
>>>>>
>>>>> On Thu, Nov 29, 2012 at 1:49 PM, Keith Turner <[email protected]>wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 29, 2012 at 12:20 PM, Anthony Fox 
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>> Compacting down to a single file is not feasible - there's about 70G
>>>>>>> in 255 tablets across 15 tablet servers.  Is there another way to tune 
>>>>>>> the
>>>>>>> compressor pool or another mechanism to verify that this is the issue?
>>>>>>
>>>>>>
>>>>>> I suppose another way to test this would be to run a lot of
>>>>>> concurrent scans, but not enough to kill the tserver.  Then get a heap 
>>>>>> dump
>>>>>> of the tserver and see if it contains a lot of 128k or 256k (can not
>>>>>> remember exact size) byte arrays that are referenced by the compressor 
>>>>>> pool.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 29, 2012 at 12:09 PM, Keith Turner <[email protected]>wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 29, 2012 at 11:14 AM, Anthony Fox <[email protected]
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> I am experiencing some issues running multiple parallel scans
>>>>>>>>> against Accumulo.  Running single scans works just fine but when I 
>>>>>>>>> ramp up
>>>>>>>>> the number of simultaneous clients, my tablet servers die due to 
>>>>>>>>> running
>>>>>>>>> out of heap space.  I've tried raising max heap to 4G which should be 
>>>>>>>>> more
>>>>>>>>> than enough but I still see this error.  I've tried with
>>>>>>>>> table.cache.block.enable=false
>>>>>>>>> table.cache.index.enable=false, and table.scan.cache.enable=false
>>>>>>>>> and all combinations of caching enabled as well.
>>>>>>>>>
>>>>>>>>> My scans involve a custom intersecting iterator that maintains no
>>>>>>>>> more state than the top key and value.  The scans also do a bit of
>>>>>>>>> aggregation on column qualifiers but the result is small and the 
>>>>>>>>> number of
>>>>>>>>> returned entries is only in the dozens.  The size of each returned 
>>>>>>>>> value is
>>>>>>>>> only around 500 bytes.
>>>>>>>>>
>>>>>>>>> Any ideas why this may be happening or where to look for further
>>>>>>>>> info?
>>>>>>>>>
>>>>>>>>
>>>>>>>> One know issues is hadoops compressor pool.  If you have a tablet
>>>>>>>> with 8 files and you query 10 terms, you will allocate 80 
>>>>>>>> decompressors.
>>>>>>>> Each decompressor uses 128K.   If you have 10 concurrent queries, 10 
>>>>>>>> terms,
>>>>>>>> and 10 files then you will allocate 1000 decompressors.    These
>>>>>>>> decompressors come from a pool that never shrinks.  So if you allocate 
>>>>>>>> 1000
>>>>>>>> at the same time, they will stay around.
>>>>>>>>
>>>>>>>> Try compacting your table down to one file and rerun your query
>>>>>>>> just to see if that helps.   If it does, then thats an important clue.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Anthony
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: tservers running out of heap space

Reply via email to