Re: Geode for time series data

Andrew Munn Sat, 27 Feb 2016 20:46:55 -0800

What if you have one market data feed handler that receives the incoming 
price changes rapidly from an exchange and you want to push those updated 
values out to listening objects using (I think) Continuous Queries?  I 
wrote a Coherence app years ago which does this and I would like to port 
it over to Geode or GF.  Any tips on cache configuration, etc?  These 
updates could come at rates of 100s per second.  In the past I would 
coalesce the updates in the feed handler and push them into the coherence 
cache only once per second.


Thanks,
Andrew



On Tue, 23 Feb 2016, Michael Stolz wrote:

> Something like that.You might choose a smaller granularity than minute if 
> you're really getting that many ticks per minute.
> But you probably want a consistent granularity to make it relatively easy to 
> find what you are looking for.
> You'll probably also want the date in the key.
> 
> 
> --Mike Stolz
> Principal Engineer, GemFire Product Manager 
> Mobile: 631-835-4771
> 
> On Tue, Feb 23, 2016 at 11:07 AM, Andrew Munn <[email protected]> wrote:
>       How does that work when you're appending incoming data in realtime?  Say
>       you're getting 1,000,000 data points per day on each of 1,000 incoming
>       stock symbols.  That is 1bln data points.  Are you using keys like this
>       that bucket the data into one array per minute of the day
> 
>               MSFT-08:00
>               MSFT-08:01
>               ...
>               MSFT-08:59
>               etc?
> 
>       each array might have several thousand elements in that case.
> 
>       Thanks
>       Andrew
> 
>       On Mon, 22 Feb 2016, Michael Stolz wrote:
> 
>       > You will definitely want to use arrays rather than storing each 
> individual data point because the overhead of
>       each entry in Geode is nearly 300 bytes.
>       > You could choose to partition by day/week/month but it shouldn't be 
> necessary because the default partitioning
>       scheme should be random enough to get reasonable distribution
>       > if you are using the metadata and starting timestamp of the array as 
> the key.
>       >
>       >
>       > --Mike Stolz
>       > Principal Engineer, GemFire Product Manager 
>       > Mobile: 631-835-4771
>       >
>       > On Fri, Feb 19, 2016 at 1:43 PM, Alan Kash <[email protected]> wrote:
>       >       Hi,
>       > I am also building a dashboard prototype for time-series data,
>       >
>       > For time-series data, usually we target a single metric change (stock 
> price, temperature, pressure, etc.) for
>       an entity, but the associated metadata with event -
>       > {StockName/Place, DeviceID, ApplicationID, EventType} remains 
> constant.
>       >
>       > For a backend like Cassandra, we denormalize everything and put 
> everything in a flat key-map with [Metric,
>       Timestamp, DeviceID, Type] as the key. This results in data
>       > duplication of the associated "Metadata". 
>       >
>       > Do you recommend similar approach for Geode ?
>       >
>       > Alternatively,
>       >
>       > We can have an array for Metrics associated with a given Metadata key 
> and store it in a Map ? 
>       >
>       > Key = [Metadata, Timestamp]
>       >
>       > TSMAP<Key, Array<Metric>> series = [1,2,3,4,5,6,7,8,9]
>       >
>       > We can partition this at application level by day / week / month. 
>       >
>       > Is this approach better ? 
>       >
>       > There is a metrics spec for TS data modeling for those who are 
> interested - http://metrics20.org 
>       >
>       > Thanks
>       >
>       >
>       >
>       > On Fri, Feb 19, 2016 at 1:11 PM, Michael Stolz <[email protected]> 
> wrote:
>       >       You will likely get best results in terms of speed of access if 
> you put some structure around the way you
>       store the data in-memory.
>       > First off, you would probably want to parse the data into the 
> individual fields and create a Java object that
>       represents that structure.
>       >
>       > Then you would probably want to bundle those Java structures into 
> arrays in such a way that it is easy to get
>       to the array for a particular date and time by the
>       > combination of a ticker and a date and time as the key.
>       >
>       > Those arrays of Java objects is what you would store as entries in 
> Geode.
>       > I think this would give you the fastest access to the data.
>       >
>       > By the way, probably better to use an integer Julian date and a long 
> integer for the time rather than a Java
>       Date. Java Dates in Geode PDX are way bigger than you
>       > want when you have millions of them.
>       >
>       > Looking at the sample dataset you provided it appears there is a lot 
> of redundant data in there. Repeating
>       1926.75 for instance. 
>       > In fact, every field but 2 are all the same. Are the repetitious 
> fields necessary? If they are, then you might
>       consider using a columnar approach instead of the
>       > Java structures I mentioned. Make an array for each column and 
> compact the repetitions with a count. It would
>       be slower but more compact.
>       > The timestamps are all the same too. Strange.
>       >
>       >
>       >
>       > --Mike Stolz
>       > Principal Engineer, GemFire Product Manager 
>       > Mobile: 631-835-4771
>       >
>       > On Fri, Feb 19, 2016 at 12:15 AM, Gregory Chase <[email protected]> 
> wrote:
>       >       Hi Andrew,I'll let one of the committers answer to your 
> specific data file question. However, you might
>       find some inspiration in this open source demo
>       >       that some of the Geode team presented at OSCON earlier this
>       year: http://pivotal-open-source-hub.github.io/StockInference-Spark/
>       >
>       > This was based on a pre-release version of Geode, so you'll want to 
> sub the M1 release in and see if any other
>       tweaks are required at that point.
>       >
>       > I believe this video and presentation go with the Github
>       project: http://www.infoq.com/presentations/r-gemfire-spring-xd
>       >
>       > On Thu, Feb 18, 2016 at 8:58 PM, Andrew Munn <[email protected]> 
> wrote:
>       >       What would be the best way to use Geode (or GF) to store and 
> utilize
>       >       financial time series data like a stream of stock trades?  I 
> have ASCII
>       >       files with timestamps that include microseconds:
>       >
>       >       2016-02-17 
> 18:00:00.000660,1926.75,5,5,1926.75,1926.75,14644971,C,43,01,
>       >       2016-02-17 
> 18:00:00.000660,1926.75,80,85,1926.75,1926.75,14644971,C,43,01,
>       >       2016-02-17 
> 18:00:00.000660,1926.75,1,86,1926.75,1926.75,14644971,C,43,01,
>       >       2016-02-17 
> 18:00:00.000660,1926.75,6,92,1926.75,1926.75,14644971,C,43,01,
>       >       2016-02-17 
> 18:00:00.000660,1926.75,27,119,1926.75,1926.75,14644971,C,43,01,
>       >       2016-02-17 
> 18:00:00.000660,1926.75,3,122,1926.75,1926.75,14644971,C,43,01,
>       >       2016-02-17 
> 18:00:00.000660,1926.75,5,127,1926.75,1926.75,14644971,C,43,01,
>       >       2016-02-17 
> 18:00:00.000660,1926.75,4,131,1926.75,1926.75,14644971,C,43,01,
>       >       2016-02-17 
> 18:00:00.000660,1926.75,2,133,1926.75,1926.75,14644971,C,43,01,
>       >
>       >       I have one file per day and each file can have over 1,000,000 
> rows.  My
>       >       thought is to fault in the files and parse the ASCII as needed. 
>  I know I
>       >       could store the data as binary primitives in a file on disk 
> instead of
>       >       ASCII for a bit more speed.
>       >
>       >       I don't have a cluster of machines to create an HDFS cluster 
> with.  My
>       >       machine does have 128GB of RAM though.
>       >
>       >       Thanks!
>       >
>       >
>       >
>       >
>       > --
>       > Greg Chase
>       > Global Head, Big Data Communities
>       > http://www.pivotal.io/big-data
>       >
>       > Pivotal Software
>       > http://www.pivotal.io/
>       >
>       > 650-215-0477
>       > @GregChase
>       > Blog: http://geekmarketing.biz/
>       >
>       >
>       >
>       >
>       >
>       >
> 
> 
> 
>

Re: Geode for time series data

Reply via email to