Re: Geode and data snapshotting

Olivier Mallassi Fri, 06 May 2016 09:28:21 -0700

Hi all

thank you for your answer.


Mike, this is not for Market Data (one day...) but it is more related to
our geode / Storm integration (as you know).

At one point, I need to snapshot my aggregates: every xx minutes, a
specific event is emitted. this event specify a txId (long). and, in the
end, every txId matches to a snaphsot (aka well known version of the
aggregates)

I was thinking about using regions like MyRegion/txID1, MyRegion/txID2
etc...

I like your pattern and it could work and be modeled like

key: aggregateKey =  a.b.c
value: aggregates[] where the index 0 is the latest txId, index 1 previous
txId and so on

The thing with this model (and this is maybe not a real issue) is that, as
I have CQ, the client will be notified with aggregates[] and not only the
latest objects. (but if I implement delta propagation?)

Maybe another option (in my case) would be to use the txId in the key.
key: aggregateKey = [a.b.c, txID1]
value: aggregate

if you have any ideas :) but in all cases, thank you.

oliv/

On Thu, May 5, 2016 at 12:52 AM, Michael Stolz <[email protected]> wrote:

> Yes the lists can be first class objects with the same key as the
> description object and possibly some sort of date stamp appended, depending
> on how many observations over how many days you want to keep.
>
> Yes, I think this model can be used very well for any periodic time-series
> data, and would therefore be a very useful pattern.
>
>
>
>
>
> --
> Mike Stolz
> Principal Engineer, GemFire Product Manager
> Mobile: 631-835-4771
>
> On Wed, May 4, 2016 at 10:45 AM, Alan Kash <[email protected]> wrote:
>
>> Mike,
>>
>> The model you just described, are you referring to one parent object
>> which describes an Entity and multiple List objects to describe measurable
>> metrics (e.g. stock price, temperature) with constant Array objects to
>> store time slices ?
>>
>> Metadata-Object
>>     - List of [metric1 timeslice array] - List<Array>
>>     - List of [metric2 timeslice array]
>>
>> How will the indexes work in this case ?
>>
>> This model can be used as a general time-series pattern for Geode.
>>
>> Thanks,
>> Alan
>>
>> On Wed, May 4, 2016 at 9:56 AM, Michael Stolz <[email protected]> wrote:
>>
>>> If what you are trying to do is get a consistent picture of market data
>>> and trade data at a point in time, then maybe some form of temporal storage
>>> organization would give you the best approach.
>>>
>>> If you can define a regular interval we can do a very elegant mechanism
>>> based on fixed length arrays in GemFire that contain point in time
>>> snapshots of the rapidly changing elements. For instance, you might want a
>>> single top-level market data description object and then a price object
>>> with individual prices at 5 minute intervals built as a simple array of
>>> doubles.
>>>
>>> Does that sound like it might be a workable pattern for you?
>>>
>>>
>>> --
>>> Mike Stolz
>>> Principal Engineer, GemFire Product Manager
>>> Mobile: 631-835-4771
>>>
>>> On Wed, May 4, 2016 at 4:34 AM, Olivier Mallassi <
>>> [email protected]> wrote:
>>>
>>>> Hi everybody
>>>>
>>>> I am facing an issue and do not know what would be the right pattern. I
>>>> guess you can help.
>>>>
>>>> The need is to create snapshot of datas:
>>>> - let's say you have a stream of incoming objects that you want to
>>>> store in a region; let's say *MyRegion*. Clients are listening (via
>>>> CQ) to updates on *MyRegion*.
>>>> - at fixed period (e.g. every 3 sec or every hours depending on the
>>>> case) you want to snapshot these datas (while keeping updated the *MyRegion
>>>> *with incoming objects). Let's say the snapshotted region follow the
>>>> convention *MyRegion/snapshot-id1*, *MyRegion/snapshot-id2*... I am
>>>> currently thinking about keeping a fixed number of snapshots and rolling on
>>>> them.
>>>>
>>>> I see several options to implement this.
>>>> - *option#1*: at fixed period, I execute a function to copy data from 
>>>> *MyRegion
>>>> *to *MyRegion/snapshot-id1*. not sure it works fine with large amount
>>>> of data. not sure how to well handle new objects arriving in *MyRegion
>>>> *while I am snapshotting it.
>>>>
>>>> - *option#2*: I write the object twice: once in *MyRegion *and also in
>>>> *MyRegion/snapshot-idN* assuming *snapshot-idN* is the latest one.
>>>> then switching to a new snapshot is about writing the objects in *MyRegion
>>>> *and *MyRegion/snapshot-idN+1*.
>>>>
>>>> Regarding option#2 (which is my preferred one but I may be wrong), I
>>>> see two implementations:
>>>> - *implem#1*. use a custom function that writes the object twice
>>>> (regions can be collocated etc...)? I can use local transaction within the
>>>> function in order to guarantee consistency between both regions.
>>>> - *implem#2*. I can use Listener and use AsyncEventListener. if they
>>>> are declared on multiple nodes, I assume there is no risk of losing data in
>>>> case of failure (e.g. a node crashes before all the "objects" in
>>>> AsyncListener are processed) ?
>>>>
>>>> Implem#1 looks easier to me (and I do not think it costs me a lot more
>>>> in terms of performance than the HA AsyncEventListener).
>>>>
>>>> What would be your opinions? favorite options? alternative options?
>>>>
>>>> I hope my email is clear enough. Many thanks for your help.
>>>>
>>>> olivier.
>>>>
>>>
>>>
>>
>

Re: Geode and data snapshotting

Reply via email to