Some options: * Decrease table.bloom.size * Increase table.bloom.error.rate * Decrease the number of files that can be opened at once * Increase the size of your JVM (may require more hardware :-)
Can you tell us a little more about the column family and the size of BF that you are getting? o.a.a.core.file.rfile.PrintInfo can get you the size of the the bloom filter in a file. -Eric On Wed, Dec 5, 2012 at 10:55 AM, Anthony Fox <[email protected]> wrote: > So, after removing the bloom filter, I get no OOMs with multiple scanners > but my column family only scans are quite slow. Is there any settings you > can recommend to enable the CF bloom filters that won't cause OOMs? > > Thanks, > Anthony > > > On Thu, Nov 29, 2012 at 3:50 PM, Anthony Fox <[email protected]> wrote: > >> Ok, a bit more info. I set -XX:+HeapDumpOnOutOfMemoryError and took a >> look at the heap dump. The thread that caused the OOM is reading a column >> family bloom filter from the CacheableBlockFile. The class taking up the >> memory is long[] which seems to be consistent with a bloom filter. Does >> this sound right? Any guidance on settings to tweak related to bloom >> filters to alleviate this issue? >> >> >> On Thu, Nov 29, 2012 at 2:24 PM, Anthony Fox <[email protected]>wrote: >> >>> Since the scan involves an intersecting iterator, it has to scan the >>> entire row range. Also, it's not even very many concurrent clients - >>> between 5 and 10. Should I turn compression off on this table or is that >>> bad idea in general? >>> >>> >>> On Thu, Nov 29, 2012 at 2:22 PM, Keith Turner <[email protected]> wrote: >>> >>>> >>>> >>>> On Thu, Nov 29, 2012 at 2:09 PM, Anthony Fox <[email protected]>wrote: >>>> >>>>> We're not on 1.4 yet, unfortunately. Are there any config params I >>>>> can tweak to manipulate the compressor pool? >>>> >>>> >>>> Not that I know of, but its been a while since I looked at that. >>>> >>>> >>>>> >>>>> >>>>> On Thu, Nov 29, 2012 at 1:49 PM, Keith Turner <[email protected]>wrote: >>>>> >>>>>> >>>>>> >>>>>> On Thu, Nov 29, 2012 at 12:20 PM, Anthony Fox >>>>>> <[email protected]>wrote: >>>>>> >>>>>>> Compacting down to a single file is not feasible - there's about 70G >>>>>>> in 255 tablets across 15 tablet servers. Is there another way to tune >>>>>>> the >>>>>>> compressor pool or another mechanism to verify that this is the issue? >>>>>> >>>>>> >>>>>> I suppose another way to test this would be to run a lot of >>>>>> concurrent scans, but not enough to kill the tserver. Then get a heap >>>>>> dump >>>>>> of the tserver and see if it contains a lot of 128k or 256k (can not >>>>>> remember exact size) byte arrays that are referenced by the compressor >>>>>> pool. >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Nov 29, 2012 at 12:09 PM, Keith Turner <[email protected]>wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Nov 29, 2012 at 11:14 AM, Anthony Fox <[email protected] >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> I am experiencing some issues running multiple parallel scans >>>>>>>>> against Accumulo. Running single scans works just fine but when I >>>>>>>>> ramp up >>>>>>>>> the number of simultaneous clients, my tablet servers die due to >>>>>>>>> running >>>>>>>>> out of heap space. I've tried raising max heap to 4G which should be >>>>>>>>> more >>>>>>>>> than enough but I still see this error. I've tried with >>>>>>>>> table.cache.block.enable=false >>>>>>>>> table.cache.index.enable=false, and table.scan.cache.enable=false >>>>>>>>> and all combinations of caching enabled as well. >>>>>>>>> >>>>>>>>> My scans involve a custom intersecting iterator that maintains no >>>>>>>>> more state than the top key and value. The scans also do a bit of >>>>>>>>> aggregation on column qualifiers but the result is small and the >>>>>>>>> number of >>>>>>>>> returned entries is only in the dozens. The size of each returned >>>>>>>>> value is >>>>>>>>> only around 500 bytes. >>>>>>>>> >>>>>>>>> Any ideas why this may be happening or where to look for further >>>>>>>>> info? >>>>>>>>> >>>>>>>> >>>>>>>> One know issues is hadoops compressor pool. If you have a tablet >>>>>>>> with 8 files and you query 10 terms, you will allocate 80 >>>>>>>> decompressors. >>>>>>>> Each decompressor uses 128K. If you have 10 concurrent queries, 10 >>>>>>>> terms, >>>>>>>> and 10 files then you will allocate 1000 decompressors. These >>>>>>>> decompressors come from a pool that never shrinks. So if you allocate >>>>>>>> 1000 >>>>>>>> at the same time, they will stay around. >>>>>>>> >>>>>>>> Try compacting your table down to one file and rerun your query >>>>>>>> just to see if that helps. If it does, then thats an important clue. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Anthony >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
