Re: Iterate through local keys of a partitioned region

Eugene Strokin Tue, 03 May 2016 07:49:55 -0700

It looks like the threshold should be bigger number not smaller, to kick
the compaction more often, so instead of


.setCompactionThreshold(5)

I should have:

.setCompactionThreshold(95)

It seems that 95 is percentage of the good data, not of the garbage.



On Tue, May 3, 2016 at 10:33 AM, Eugene Strokin <[email protected]> wrote:

> Thanks a lot!!.
> Just want to make sure I understood you correctly: if I set Compaction
> Threshold to 5% and the garbage is only 3% of the data, and I'm forcing the
> compaction it will not remove those 3%? So forcing the compaction is only
> suggesting to check the threshold right now, no more?
>
> I've studied the documents, and ended up with such configuration:
>
> DiskStore diskStore = cache.createDiskStoreFactory()
> .setMaxOplogSize(512)
> .setDiskDirsAndSizes(new File[] { new File("/opt/ccio/geode/store") }, new
> int[] { 18000 })
> .setAllowForceCompaction(true)
> .setCompactionThreshold(5)
> .create("-ccio-store");
>
> I know it looks dangerous, but in my case the cache constantly grows, no
> updates, no deletes, just writes and many reads. So, auto-compaction should
> nether happen until my custom Disk Space Checker would detect free disk
> space is less than 1Gb, and it will kick scan of the local view of the
> region and finds LRU records and delete them. Looks like at this point the
> optlogs would only grow more consuming even more space, but I'm forcing the
> compaction right off.
>
> Using setting above I'm planing to utilize as much disk space as possible.
>
> Will be testing on our staging, and tweak the number. Will see how it
> would go.
>
>
> Thanks again! Geode looks very promising! I've tried several solutions
> before I've ended up with Geode. All of them had problems I couldn't ditch,
> and looks like the Geode is the one I'll be married to :-))
>
> Eugene
>
> On Mon, May 2, 2016 at 6:52 PM, Barry Oglesby <[email protected]> wrote:
>
>> Answers / comments below.
>>
>> Thanks,
>> Barry Oglesby
>>
>>
>> On Mon, May 2, 2016 at 8:58 AM, Eugene Strokin <[email protected]>
>> wrote:
>>
>>> Barry, I've tried your code.
>>> It looks like the function call is actually waiting till all the nodes
>>> would complete the function, which I don't really need, but it was fun to
>>> watch how everything works in the cluster.
>>>
>>
>> Yes, the way that function is currently implemented causes it to wait for
>> all results.
>>
>> Functions can either return a result or not. If they return a result,
>> then the caller will wait for that result from all members processing that
>> function.
>>
>> You can change the function to not return a result (fire-and-forget) by:
>>
>> - changing hasResult to return false
>> - not returning a result from the execute method (remove
>> context.getResultSender().lastResult(true)
>> - not expecting a result in the client (remove collector.getResult())
>>
>>
>>> Even though I didn't use the function call, everything else works just
>>> fine.
>>> I was able to iterate through the local cache and find the LRU entity.
>>> Because I had to finish the whole loop before actually destroying the
>>> item from the cache, I used:
>>>
>>> region.destroy(toBeDeleted);
>>>
>>> "region" is the region I've created using cache, not the region I used
>>> for iterating the data:
>>>
>>> Region<String, byte[]> localView =
>>> PartitionRegionHelper.getLocalPrimaryData(region);
>>> "localView" contains the local data which I actually iterate through.
>>> I've tried to do:
>>>
>>> localView.destroy(toBeDeleted);
>>>
>>> But it didn't work for some reason. "region.destroy" works, but I'm not
>>> sure this is the right way to do this. If not, please let me know.
>>>
>>>
>> PartitionRegionHelper.getLocalPrimaryData(region) just returns a
>> LocalDataSet the wraps the local primary buckets. Most operations on it
>> (including destroy) are delegated to the underlying partitioned region.
>>
>> So, invoking destroy on either region should work. What exception are you
>> seeing with localView.destroy(toBeDeleted)?
>>
>>
>>> The main problem is that even though I'm destroying some data from the
>>> Cache, but I don't see the available hard drive space is getting bigger,
>>> even when I force compaction every time I destroy an Item.
>>> I've destroyed about 300 items, no free disc space gained.
>>> I'm guessing if I delete enough items from cache it will actually free
>>> up some space on disk. But what is this magical number? Is it the size of a
>>> bucket or anything else?
>>>
>>>
>> Whenever a new cache operation occurs, a record is added to the end of
>> the current oplog for that operation. Any previous record(s) for that entry
>> are no longer valid, but they still exist in the oplogs. For example, a
>> create followed by a destroy will cause the oplog to contain 2 records for
>> that entry.
>>
>> The invalid records aren't removed until (a) the oplog containing the
>> invalid records is a configurable (default=50) percent garbage and (b) a
>> compaction occurs.
>>
>> So, forcing a compaction after each destroy probably won't do much (as
>> you've seen). The key is to get the oplog to be N% garbage so that when a
>> compaction occurs, it is actually compacted.
>>
>> The percentage is configurable via the compaction-threshold attribute.
>> The lower you set this attribute, the faster oplogs will be compacted. You
>> need to be a bit careful though. If you set this attribute too low, you'll
>> be constantly copying data between oplogs.
>>
>> Check out these docs pages regarding the compaction-threshold and
>> compaction:
>>
>>
>> http://geode.docs.pivotal.io/docs/managing/disk_storage/disk_store_configuration_params.html
>>
>> http://geode.docs.pivotal.io/docs/managing/disk_storage/compacting_disk_stores.html
>>
>>
>>> Thanks,
>>> Eugene
>>>
>>>
>>>
>>>
>>> On Thu, Apr 28, 2016 at 1:53 PM, Barry Oglesby <[email protected]>
>>> wrote:
>>>
>>>> I think I would use a function to iterate all the local region entries
>>>> pretty much like Udo suggested.
>>>>
>>>> I attached an example that iterates all the local primary entries and,
>>>> based on the last accessed time, removes them. In this example the test is
>>>> '< now', so all entries are removed. Of course, you do whatever you want
>>>> with that test.
>>>>
>>>> The call to PartitionRegionHelper.getLocalDataForContext returns only
>>>> primary entries since optimizeForWrite returns true.
>>>>
>>>> This function currently returns 'true', but it could easily be changed
>>>> to return an info object containing the number of entries checked and
>>>> removed (or something similar)
>>>>
>>>> Execute it on the region like:
>>>>
>>>> Execution execution = FunctionService.onRegion(this.region);
>>>> ResultCollector collector =
>>>> execution.execute("CheckLastAccessedTimeFunction");
>>>> Object result = collector.getResult();
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Barry Oglesby
>>>>
>>>>
>>>> On Wed, Apr 27, 2016 at 5:36 PM, Eugene Strokin <[email protected]>
>>>> wrote:
>>>>
>>>>> Udo, thanks a lot. Yes, I do have the same idea to run the process on
>>>>> each node, and once it finds that there is not much space left it would
>>>>> kick old records out on that server. I'll give your code a try first thing
>>>>> tomorrow. Looks like this is exactly what I need.
>>>>> Anil, Udo is right, I've managed to set up eviction from heap to
>>>>> overflow disk storage. It looks fine now. I'm running a performance test
>>>>> currently and it looks stable so far. But my cache is ever growing, and I
>>>>> could run out of space. The nature of the data allows me to remove old
>>>>> cached items without any problem, and if they are needed again, I could
>>>>> always get them from a storage.
>>>>> So, Geode evicts from memory to overflow, but I also need to evict the
>>>>> items completly off the cache
>>>>> On Apr 27, 2016 6:02 PM, "Udo Kohlmeyer" <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Anil,
>>>>>>
>>>>>> Eugene's usecase is such that his memory is low (300Mb) but larger
>>>>>> diskspace.
>>>>>> He has already configured eviciton to manage the memory aspect. He is
>>>>>> just trying to clean up some local disk space. This is a continuation of 
>>>>>> a
>>>>>> previous thread "System Out of Memory".
>>>>>>
>>>>>> But yes, eviciton could fulfill the same requirement if his memory
>>>>>> was larger.
>>>>>>
>>>>>> --Udo
>>>>>>
>>>>>> On 28/04/2016 7:41 am, Anilkumar Gingade wrote:
>>>>>>
>>>>>> Any reason why the supported eviction/expiration does not work for
>>>>>> your case...
>>>>>>
>>>>>> -Anil.
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 27, 2016 at 1:49 PM, Udo Kohlmeyer <[email protected]
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi there Eugene,
>>>>>>>
>>>>>>> The free space checking code, is that running as a separate process
>>>>>>> or as part of each of the server jvms?
>>>>>>> I would run the free space checking as part of each server(deployed
>>>>>>> as part of the server code). This way each server will monitor it's own
>>>>>>> free space.
>>>>>>>
>>>>>>> I'm not sure how to get the last access time of each item, but if
>>>>>>> you can get hold of that information, then you can run some code that 
>>>>>>> will
>>>>>>> use the PartitionRegionHelper.getLocalData(Region) or
>>>>>>> PartitionRegionHelper.getLocalPrimaryData(Region) to get the local data.
>>>>>>>
>>>>>>> Then you could remove/invalidate the data entry.
>>>>>>>
>>>>>>> Also disk store compaction now plays a role. So you might have to
>>>>>>> trigger a compaction of the diskstore in order to avoid unnecessary data
>>>>>>> being held in the diskstores.
>>>>>>>
>>>>>>> The simplest way you could do this is by running the following: (as
>>>>>>> per the DiskStore API
>>>>>>> <http://geode.incubator.apache.org/releases/latest/javadoc/com/gemstone/gemfire/cache/DiskStore.html>
>>>>>>> )
>>>>>>>
>>>>>>> Cache cache = CacheFactory.getAnyInstance();
>>>>>>> DiskStore diskstore = cache.findDiskStore("diskStoreName");
>>>>>>> diskstore.forceCompaction();
>>>>>>>
>>>>>>> The forceCompaction method is blocking, so please do not make this
>>>>>>> code as part of some critical processing step.
>>>>>>>
>>>>>>> --Udo
>>>>>>>
>>>>>>> On 28/04/2016 6:25 am, Eugene Strokin wrote:
>>>>>>>
>>>>>>> I'm running a periodic check of the free space on each node of my
>>>>>>> cluster. The cluster contains a partitioned region.
>>>>>>> If some node is getting full, I'd like to remove least recently used
>>>>>>> items to free up the space. New items are getting loaded constantly.
>>>>>>> I've enabled statistics, so it looks like I can get last access time
>>>>>>> of each item, but I'd like to iterate through only "local" items, the 
>>>>>>> items
>>>>>>> which are stored on the local node only. I'm trying different things, 
>>>>>>> but
>>>>>>> none of them seems right.
>>>>>>> Is it even possible? If so, could you please point me to the right
>>>>>>> direction?
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Eugene
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: Iterate through local keys of a partitioned region

Reply via email to