Re: Geode and data snapshotting

Michael Stolz Sat, 07 May 2016 12:06:13 -0700

For the CQ to deliver only latest, you would need to have a separately
keyed "Latest" entity that they can register the interest on.


How big are each of the aggregates? If they are large you will not get much
benefit from my array model.

The array model is ideal for fixed numbers of doubles or integers like
availability counts and rates in hotel systems or 5 minute prices on
financial instruments.

--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: 631-835-4771

On Fri, May 6, 2016 at 12:27 PM, Olivier Mallassi <
[email protected]> wrote:

> Hi all
>
> thank you for your answer.
>
> Mike, this is not for Market Data (one day...) but it is more related to
> our geode / Storm integration (as you know).
>
> At one point, I need to snapshot my aggregates: every xx minutes, a
> specific event is emitted. this event specify a txId (long). and, in the
> end, every txId matches to a snaphsot (aka well known version of the
> aggregates)
>
> I was thinking about using regions like MyRegion/txID1, MyRegion/txID2
> etc...
>
> I like your pattern and it could work and be modeled like
>
> key: aggregateKey =  a.b.c
> value: aggregates[] where the index 0 is the latest txId, index 1
> previous txId and so on
>
> The thing with this model (and this is maybe not a real issue) is that, as
> I have CQ, the client will be notified with aggregates[] and not only the
> latest objects. (but if I implement delta propagation?)
>
> Maybe another option (in my case) would be to use the txId in the key.
> key: aggregateKey = [a.b.c, txID1]
> value: aggregate
>
> if you have any ideas :) but in all cases, thank you.
>
> oliv/
>
> On Thu, May 5, 2016 at 12:52 AM, Michael Stolz <[email protected]> wrote:
>
>> Yes the lists can be first class objects with the same key as the
>> description object and possibly some sort of date stamp appended, depending
>> on how many observations over how many days you want to keep.
>>
>> Yes, I think this model can be used very well for any periodic
>> time-series data, and would therefore be a very useful pattern.
>>
>>
>>
>>
>>
>> --
>> Mike Stolz
>> Principal Engineer, GemFire Product Manager
>> Mobile: 631-835-4771
>>
>> On Wed, May 4, 2016 at 10:45 AM, Alan Kash <[email protected]> wrote:
>>
>>> Mike,
>>>
>>> The model you just described, are you referring to one parent object
>>> which describes an Entity and multiple List objects to describe measurable
>>> metrics (e.g. stock price, temperature) with constant Array objects to
>>> store time slices ?
>>>
>>> Metadata-Object
>>>     - List of [metric1 timeslice array] - List<Array>
>>>     - List of [metric2 timeslice array]
>>>
>>> How will the indexes work in this case ?
>>>
>>> This model can be used as a general time-series pattern for Geode.
>>>
>>> Thanks,
>>> Alan
>>>
>>> On Wed, May 4, 2016 at 9:56 AM, Michael Stolz <[email protected]> wrote:
>>>
>>>> If what you are trying to do is get a consistent picture of market data
>>>> and trade data at a point in time, then maybe some form of temporal storage
>>>> organization would give you the best approach.
>>>>
>>>> If you can define a regular interval we can do a very elegant mechanism
>>>> based on fixed length arrays in GemFire that contain point in time
>>>> snapshots of the rapidly changing elements. For instance, you might want a
>>>> single top-level market data description object and then a price object
>>>> with individual prices at 5 minute intervals built as a simple array of
>>>> doubles.
>>>>
>>>> Does that sound like it might be a workable pattern for you?
>>>>
>>>>
>>>> --
>>>> Mike Stolz
>>>> Principal Engineer, GemFire Product Manager
>>>> Mobile: 631-835-4771
>>>>
>>>> On Wed, May 4, 2016 at 4:34 AM, Olivier Mallassi <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi everybody
>>>>>
>>>>> I am facing an issue and do not know what would be the right pattern.
>>>>> I guess you can help.
>>>>>
>>>>> The need is to create snapshot of datas:
>>>>> - let's say you have a stream of incoming objects that you want to
>>>>> store in a region; let's say *MyRegion*. Clients are listening (via
>>>>> CQ) to updates on *MyRegion*.
>>>>> - at fixed period (e.g. every 3 sec or every hours depending on the
>>>>> case) you want to snapshot these datas (while keeping updated the 
>>>>> *MyRegion
>>>>> *with incoming objects). Let's say the snapshotted region follow the
>>>>> convention *MyRegion/snapshot-id1*, *MyRegion/snapshot-id2*... I am
>>>>> currently thinking about keeping a fixed number of snapshots and rolling 
>>>>> on
>>>>> them.
>>>>>
>>>>> I see several options to implement this.
>>>>> - *option#1*: at fixed period, I execute a function to copy data from 
>>>>> *MyRegion
>>>>> *to *MyRegion/snapshot-id1*. not sure it works fine with large amount
>>>>> of data. not sure how to well handle new objects arriving in *MyRegion
>>>>> *while I am snapshotting it.
>>>>>
>>>>> - *option#2*: I write the object twice: once in *MyRegion *and also
>>>>> in *MyRegion/snapshot-idN* assuming *snapshot-idN* is the latest one.
>>>>> then switching to a new snapshot is about writing the objects in *MyRegion
>>>>> *and *MyRegion/snapshot-idN+1*.
>>>>>
>>>>> Regarding option#2 (which is my preferred one but I may be wrong), I
>>>>> see two implementations:
>>>>> - *implem#1*. use a custom function that writes the object twice
>>>>> (regions can be collocated etc...)? I can use local transaction within the
>>>>> function in order to guarantee consistency between both regions.
>>>>> - *implem#2*. I can use Listener and use AsyncEventListener. if they
>>>>> are declared on multiple nodes, I assume there is no risk of losing data 
>>>>> in
>>>>> case of failure (e.g. a node crashes before all the "objects" in
>>>>> AsyncListener are processed) ?
>>>>>
>>>>> Implem#1 looks easier to me (and I do not think it costs me a lot more
>>>>> in terms of performance than the HA AsyncEventListener).
>>>>>
>>>>> What would be your opinions? favorite options? alternative options?
>>>>>
>>>>> I hope my email is clear enough. Many thanks for your help.
>>>>>
>>>>> olivier.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Geode and data snapshotting

Reply via email to