Re: Load all data from DB on Cache Start

Amit Pandey Mon, 26 Dec 2016 22:54:38 -0800

John,

Thanks for the awesome answer this is really helpful.


I have only one concern. My regions are read-write through enabled.
So if I use a BeanPostProcessor to load after the cache is initialized will
it try to write through to the database again? Its something that will be
no op but can result in millions of writes being sent to the database.

Thanks for the cloud foundry tip I will look into it.

Regards

On Tue, Dec 27, 2016 at 1:58 AM, John Blum <[email protected]> wrote:

> Amit-
>
> Regarding...
>
> *> I want to load all data on cache startup at a go.*
>
> Since you are using "*Spring*", you could easily implement a *Spring*
> BeanPostProcessor [1] (BPP) for each (or all the) *Region(s)* in which
> you need to load data.  I do this frequently in *Spring Data
> GemFire/Geode's* test suite when testing *Region* data access operations
> using the GemfireTemplate, *Repositories* or things of that nature.  Clearly
> your BPP could use a DataSource to load the data from an external data
> store (e.g. RDBMS).
>
> Another way to do load data on startup is to use a Geode *Initializer*.
> However, this would require you to specify a snippet of cache.xml and
> does not work if you specify your *Regions* in *Spring* (XML/Java) config
> as you should when using *Spring*.  I also don't recommend using cache.xml,
> but is the pure, non-*Spring* way to invoke logic after the cache has
> been "fully" initialized (i.e. where the *Regions* have been defined in
> cache.xml).
>
> See here [2] for more details.  Note, the documentation talks of
> "launching an application" on startup, after cache initialization, but
> technically, you can do whatever you want, like load data.
>
> I recommend the BPP.
>
>
> *> How should I set it up in config to allow it to join other nodes in
> cluster?*
>
> Regardless of whether your server data node is "embedded" or not, you can
> still use a Locator, or mcast to have the node join the cluster.  The
> "embedded" scenario, where the "application" is a GemFire Server data node
> will be part of the cluster as Udo said.
>
> This is easily achievable with...
>
> <util:properties id="gemfireProperties">
>   <prop key="name">Example</prop>
>   <!-- Set to non-zero value to use Multicast; comment out "locators" -->
>   <prop key="*mcast-port*">0</prop>
>   <prop key="log-level">${gemfire.log-level:config}</prop>
>   <prop key=“*locators*”>someHost[10334]</prop>
>   <prop key="start-locator">localhost[1034]</prop>
> </util:properties>
>
> <gfe:cache properties-ref="gemfireProperties"/>
>
> ...
>
>
> As you can see from the snippet of *Spring* XML config above, this
> application is a Geode "peer" cache (i.e. embeds a Geode data node/server).
>
> The "*locators*" Geode/GemFire property enables this node to connect to a
> cluster.  Likewise, you can use the "*mcast-port*" property instead,
> however, I would recommend *Locators* over mcast.
>
> Additionally, you can see that I specified the "start-locator"
> Geode/GemFire property, which enables me to start an embedded Locator.
> Useful for testing purposes and connecting Geode data nodes together in a
> cluster without a dedicated Locator, though, this approach is less
> resilient if the applications/servers go down (as may be the case in a
> micro-services scenario)!
>
>
> *> if I start with embedded server is it required to use client pool or is
> it not required?*
>
> A "client pool" is only applicable to cache clients (i.e. ClientCaches)
> on the "client-side" of the equation.  "peers" find (Locator, mcast) and
> communicate (TCP/UDP, JGroups) with each other through other means once a
> cluster is formed.
>
> In fact, typically, it is more common to position your microservices-based
> applications as Geode cache clients (i.e. <gfe:client-cache ...>) and
> have them connect to a dedicated Geode service (i.e. cluster of Geode
> servers/data nodes where also, 1 or more of those nodes are running a "
> CacheServer", listening for cache clients to connect).  These dedicated
> Geode server nodes in a cluster constituting the service can still be
> configured with *Spring*, but they typically will not contain an
> application-specific components other than CacheListeners, Loaders,
> Writers, AEQ *Listeners*, etc.
>
> ClientCache applications use 1 or more Pools configured to talk to the
> servers in the cluster (either by way of Locator or direct server
> communication). Pools can be configured with groups to target specific
> members (in that group) in the cluster.  Typically, members in 1 group host
> a different set of Regions from another group and is a way to separate data
> traffic from 1 client to another dedicated to a specific resource/purpose
> (usually based on business function, etc).
>
> On a side note, some of what you are wanting to do "scale-wise" seems like
> a perfect fit for Pivotal CloudFoundry, which can auto-scale up or down
> nodes in your cluster based on load and other factors.
>
> Anyway, hope this helps!
>
> -John
>
>
>
>
>
> [1] http://docs.spring.io/spring/docs/current/spring-framework-reference/
> htmlsingle/#beans-factory-extension-bpp
> [2] http://geode.apache.org/docs/guide/basic_config/the_
> cache/setting_cache_initializer.html
>
>
> On Sun, Dec 25, 2016 at 11:12 PM, Amit Pandey <[email protected]>
> wrote:
>
>> Hey,
>>
>> Thanks.
>>
>> I have lots of reference data which will be loaded at start of day. This
>> data is not bound to change much and as such I want to keep it loaded at
>> the start of day. Read through will make it slow while it is being actually
>> accessed so I want to keep it loaded in memory.
>>
>> Also I want to have functions which will be called by clients to do some
>> compute and return results. Using functions should allow me to add nodes
>> and speed up the compute.
>>
>> I have some micro services each of which will start a gemfire node, and I
>> want to connect, so yes I can set it up with locator.
>>
>> However I have one doubt, if I start with embedded server is it required
>> to use client pool or is it not required?
>>
>> Regards
>>
>> On Mon, Dec 26, 2016 at 1:18 AM, Udo Kohlmeyer <[email protected]>
>> wrote:
>>
>>> Hi there Amit,
>>>
>>> At this stage the only way you could load all data at one go is to write
>>> a client to connect to the db and load all in. Another approach could be to
>>> write the same code into a function and invoke the function at start up.
>>> But in both cases both are manual.
>>>
>>> To have geode servers join a cluster, you have 2 ways.
>>>
>>>    1. Connecting them up via a locator
>>>    2. Connecting them up via mcast.
>>>
>>> Please be aware the once you connect a server to a cluster, that server
>>> becomes an integral part of the cluster so adding/removing servers from a
>>> cluster is not something you'd want to do in a load-based scaling model.
>>> i.e if the load is high, add a server and if load is low, shut down a
>>> server.
>>>
>>> Just interest sake, what is your use case.
>>>
>>> --Udo
>>>
>>> On 12/24/16 05:57, Amit Pandey wrote:
>>>
>>> Hi Guys,
>>>
>>> I am using Spring Data Geode. I have been able to use read and write
>>> through/ write behind. I want to load all data on cache startup at a go.
>>>
>>> Secondly my geode server is embedded but I want to allow it join to
>>> other nodes.  How should I set it up in config to allow it to join other
>>> nodes in cluster?
>>>
>>> Regards
>>>
>>>
>>>
>>
>
>
> --
> -John
> john.blum10101 (skype)
>

Re: Load all data from DB on Cache Start

Reply via email to