If I understand you properly, you want to do snapshots, then I would use the snapshop <http://geode.docs.pivotal.io/docs/managing/cache_snapshots/chapter_overview.html> facilities already in Geode -
- gfsh import <http://geode.docs.pivotal.io/docs/managing/cache_snapshots/importing_a_snapshot.html> or export <http://geode.docs.pivotal.io/docs/managing/cache_snapshots/exporting_a_snapshot.html> data - Java API These can import/export into files or regions. I would think a cron job running a gfsh script would do the trick - no need to write any code unless you want to do filtering, etc. I would read the caveats in the docs concerning CacheListeners, etc. On Wed, May 4, 2016 at 4:34 AM, Olivier Mallassi <[email protected] > wrote: > Hi everybody > > I am facing an issue and do not know what would be the right pattern. I > guess you can help. > > The need is to create snapshot of datas: > - let's say you have a stream of incoming objects that you want to store > in a region; let's say *MyRegion*. Clients are listening (via CQ) to > updates on *MyRegion*. > - at fixed period (e.g. every 3 sec or every hours depending on the case) > you want to snapshot these datas (while keeping updated the *MyRegion *with > incoming objects). Let's say the snapshotted region follow the convention > *MyRegion/snapshot-id1*, *MyRegion/snapshot-id2*... I am currently > thinking about keeping a fixed number of snapshots and rolling on them. > > I see several options to implement this. > - *option#1*: at fixed period, I execute a function to copy data from > *MyRegion > *to *MyRegion/snapshot-id1*. not sure it works fine with large amount of > data. not sure how to well handle new objects arriving in *MyRegion *while > I am snapshotting it. > > - *option#2*: I write the object twice: once in *MyRegion *and also in > *MyRegion/snapshot-idN* assuming *snapshot-idN* is the latest one. then > switching to a new snapshot is about writing the objects in *MyRegion *and > *MyRegion/snapshot-idN+1*. > > Regarding option#2 (which is my preferred one but I may be wrong), I see > two implementations: > - *implem#1*. use a custom function that writes the object twice (regions > can be collocated etc...)? I can use local transaction within the function > in order to guarantee consistency between both regions. > - *implem#2*. I can use Listener and use AsyncEventListener. if they are > declared on multiple nodes, I assume there is no risk of losing data in > case of failure (e.g. a node crashes before all the "objects" in > AsyncListener are processed) ? > > Implem#1 looks easier to me (and I do not think it costs me a lot more in > terms of performance than the HA AsyncEventListener). > > What would be your opinions? favorite options? alternative options? > > I hope my email is clear enough. Many thanks for your help. > > olivier. > -- Regards, Jim Bedenbaugh Advisory Data Engineer Pivotal Software Optimism is not a naive hope for a better world, but a philosophical doctrine that is unafraid to voice harsh realities, embrace their confrontation and execute painful decisions with an unyielding commitment to excellence, buoyed with the confidence that by doing the right thing, one creates a better world, having done the least harm.
