Since the scan involves an intersecting iterator, it has to scan the entire row range. Also, it's not even very many concurrent clients - between 5 and 10. Should I turn compression off on this table or is that bad idea in general?
On Thu, Nov 29, 2012 at 2:22 PM, Keith Turner <[email protected]> wrote: > > > On Thu, Nov 29, 2012 at 2:09 PM, Anthony Fox <[email protected]> wrote: > >> We're not on 1.4 yet, unfortunately. Are there any config params I can >> tweak to manipulate the compressor pool? > > > Not that I know of, but its been a while since I looked at that. > > >> >> >> On Thu, Nov 29, 2012 at 1:49 PM, Keith Turner <[email protected]> wrote: >> >>> >>> >>> On Thu, Nov 29, 2012 at 12:20 PM, Anthony Fox <[email protected]>wrote: >>> >>>> Compacting down to a single file is not feasible - there's about 70G in >>>> 255 tablets across 15 tablet servers. Is there another way to tune the >>>> compressor pool or another mechanism to verify that this is the issue? >>> >>> >>> I suppose another way to test this would be to run a lot of concurrent >>> scans, but not enough to kill the tserver. Then get a heap dump of the >>> tserver and see if it contains a lot of 128k or 256k (can not remember >>> exact size) byte arrays that are referenced by the compressor pool. >>> >>> >>>> >>>> >>>> On Thu, Nov 29, 2012 at 12:09 PM, Keith Turner <[email protected]>wrote: >>>> >>>>> >>>>> >>>>> On Thu, Nov 29, 2012 at 11:14 AM, Anthony Fox <[email protected]>wrote: >>>>> >>>>>> I am experiencing some issues running multiple parallel scans against >>>>>> Accumulo. Running single scans works just fine but when I ramp up the >>>>>> number of simultaneous clients, my tablet servers die due to running out >>>>>> of >>>>>> heap space. I've tried raising max heap to 4G which should be more than >>>>>> enough but I still see this error. I've tried with >>>>>> table.cache.block.enable=false >>>>>> table.cache.index.enable=false, and table.scan.cache.enable=false and >>>>>> all combinations of caching enabled as well. >>>>>> >>>>>> My scans involve a custom intersecting iterator that maintains no >>>>>> more state than the top key and value. The scans also do a bit of >>>>>> aggregation on column qualifiers but the result is small and the number >>>>>> of >>>>>> returned entries is only in the dozens. The size of each returned value >>>>>> is >>>>>> only around 500 bytes. >>>>>> >>>>>> Any ideas why this may be happening or where to look for further info? >>>>>> >>>>> >>>>> One know issues is hadoops compressor pool. If you have a tablet with >>>>> 8 files and you query 10 terms, you will allocate 80 decompressors. Each >>>>> decompressor uses 128K. If you have 10 concurrent queries, 10 terms, and >>>>> 10 files then you will allocate 1000 decompressors. These decompressors >>>>> come from a pool that never shrinks. So if you allocate 1000 at the same >>>>> time, they will stay around. >>>>> >>>>> Try compacting your table down to one file and rerun your query just >>>>> to see if that helps. If it does, then thats an important clue. >>>>> >>>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>> Anthony >>>>>> >>>>> >>>>> >>>> >>> >> >
