Re: Load all data from DB on Cache Start

Luke Shannon Sat, 14 Jan 2017 14:10:09 -0800

Great points John. Lots of gems in those Geode tips you just gave :-D

On Jan 14, 2017 4:39 PM, "John Blum" <[email protected]> wrote:


> Amit-
>
> Another thing, a BPP is my recommended way in *Spring* to load data into
> a Region after initialization, so I whole heartily support Luke on this.
>
> Also keep in mind, if you need the initial Region load to be done
> asynchronously (a BPP callback method is invoked synchronously during a
> *Spring* ApplicationContext refresh and will block all other (possible)
> beans (coming after) from being initialized), then you are responsible for
> making that happen... perhaps with an appropriate Executor and Future.
> Keep in mind that you can also publish (fire) an ApplicationEvent to your
> "interested" application components (beans) that need to know when the
> Region is fully loaded and ready for use.
>
> Additionally, if you do not need to preload your Region on startup, then a
> CacheLoader is the recommended way to load data into your Region on cache
> misses (another synchronous mechanism called a "read-through").
>
> A word of caution, never, ever auto-wire or inject any beans into a BPP.
> To do so could cause premature initialization.  Always rely on the bean
> instance passed to the BPPs postProcessXXXX methods.
>
> Thanks,
> John
>
>
> On Sat, Jan 14, 2017 at 1:30 PM, John Blum <[email protected]> wrote:
>
>> Hi Amit, Luke-
>>
>> Thank you Luke.
>>
>> Actually Luke is mostly correct.  In this case, the order, however, DOES
>> NOT matter.  The *Spring* container is intimately aware of certain types
>> of beans defined/declared in the *Spring* ApplicationContext.
>> BeanPostProcessors, a container extension point (hook), are one of them.
>>
>> *Spring* creates all BeanPostProcessors (BPP) before any other
>> application beans in order to post process each bean defined/declared in
>> the container (except for BPPs and BeanFactoryPostProcessors, of
>> course).  The container then proceeds to call the BPP *before* the bean
>> is "initialized" by the container (i.e. postProcessBeforeInitializatio
>> n(..)) as well as *after* the bean has been "initialized".  A bean
>> initialization corresponds to InitializingBean.afterPropertiesSet(), any
>> init() methods marked as such in XML config or any @PostContruct methods.
>>
>> Most SDG FactoryBeans (e.g. PartitionedRegionFactoryBean ->
>> RegionFactoryBean) always create their GemFire object (e.g. Region) in
>> the afterPropertiesSet() (i.e. initialization) method as the
>> <SDG>FactoryBean implements *Spring's* InitializingBean (callback)
>> interface.
>>
>> Therefore, technically, it is safe to define/declare any beans, in any
>> order, since the dependencies and callbacks (BPP) pretty much determine the
>> order in which beans are constructed, configured and initialized.  SDG even
>> takes the Spring container DI concept to the level of ensure GemFire
>> objects are created in the order that GemFire expects based on both
>> explicit and implicit dependencies (think Regions and a DiskStore, for
>> instance, where the DS is just named in the Region configuration;
>> under-the-hood, though, SDG creates a RuntimeReference on the named DS
>> to ensure the proper order).  Another example would be, it is also possible
>> to defined/declare your Regions before a the Cache instance...
>>
>> <gfe:partitioned-region id="Products" ... />
>>
>> <gfe:cache/>
>>
>> SDG does not care how your define yours beans generally will do the right
>> thing.  Using JavaConfig is a bit different though and in certain cases you
>> have be a bit more conscientious of the order.
>>
>> In general, if you had a container with multiple beans defined/declared
>> that had NO dependencies between them (or other pre-defined order
>> specified, such as when using *Spring's* @Ordered annotation in an
>> AnnotationBasedApplicationContext or by implementing the Ordered
>> interface), then *Spring* will pretty much proceed to construct,
>> configure and initialize beans in the order they are declared in the
>> ApplicationContext config.
>>
>> Now, if you have multiple BPPs to process the Region, for various
>> reasons, then you will need to define order among them by using the
>> @Ordered annotation or by having your custom BPP implement the Ordered
>> interface, if order is important.  If an order is not given, then
>> *Spring* makes no guarantees which BPP will be invoked first.
>>
>> Anyway, all of this is well-described in the Spring documentation on 
>> "*Customizing
>> the nature of a bean*" [1] as well as in "Container Extension Points"
>> [2].
>>
>> Hope this helps.
>>
>> -John
>>
>> [1] http://docs.spring.io/spring/docs/current/spring-framewo
>> rk-reference/htmlsingle/#beans-factory-nature
>> [2] http://docs.spring.io/spring/docs/current/spring-framewo
>> rk-reference/htmlsingle/#beans-factory-extension
>>
>>
>> On Sat, Jan 14, 2017 at 8:38 AM, Amit Pandey <[email protected]>
>> wrote:
>>
>>> Okay...yea as post processors process everything in the IOC thats the
>>> only way I guess
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Sat, Jan 14, 2017 at 9:36 PM, Luke Shannon <[email protected]>
>>> wrote:
>>>
>>>> Hi Amit,
>>>>
>>>> In the past I have done it like this:
>>>>
>>>> Define a BeanPostProcessor like below. It will go out and get the data
>>>> from where ever it lives, convert it to objects and then put them into the
>>>> region using a Region reference passed in shortly after the region is
>>>> initialized. This bean will need to be in the class path of Geode when it
>>>> start up. If using gfsh you can add it to the '--classpath' argument of the
>>>> 'start server' command.
>>>>
>>>> You can then wire this bean into the Geode Cache xml like so:
>>>>
>>>> <gfe:replicated-region id="Product" />
>>>>
>>>> <bean id="productLoader" class="mypackage.ProductLoader">
>>>>
>>>> <property name="targetBeanName" value="Product" />
>>>>
>>>> </bean>
>>>>
>>>> Note that this bean is placed *below* your region definitions in the
>>>> spring cache xml. If I remember correctly order matters and it will try and
>>>> run this before the Region reference is created if the order is not 
>>>> correct.
>>>>
>>>> Hope this helps,
>>>>
>>>> Luke
>>>>
>>>> import java.io.BufferedReader;
>>>> import java.io.File;
>>>> import java.io.FileReader;
>>>> import java.io.IOException;
>>>> import java.util.HashMap;
>>>> import java.util.Map;
>>>> import org.springframework.beans.BeansException;
>>>> import org.springframework.beans.factory.config.BeanPostProcessor;
>>>> import org.springframework.util.Assert;
>>>> import org.springframework.util.StringUtils;
>>>> import com.gemstone.gemfire.cache.Region;
>>>> import com.google.gson.Gson;
>>>>
>>>>
>>>> public class ProductLoader implements BeanPostProcessor {
>>>>
>>>> private String targetBeanName;
>>>> protected String getTargetBeanName() {
>>>>    Assert.state(StringUtils.hasText(targetBeanName), "The target
>>>> Spring context bean name was not properly specified!");
>>>>    return targetBeanName;
>>>>  }
>>>>
>>>>  public void setTargetBeanName(final String targetBeanName) {
>>>>    Assert.hasText(targetBeanName, "The target Spring context bean name
>>>> must be specified!");
>>>>    this.targetBeanName = targetBeanName;
>>>>  }
>>>>
>>>>  @Override
>>>>  public Object postProcessBeforeInitialization(final Object bean,
>>>> final String beanName) throws BeansException {
>>>>    return bean;
>>>>  }
>>>>
>>>> @SuppressWarnings({ "unchecked", "rawtypes" })
>>>> @Override
>>>>  public Object postProcessAfterInitialization(final Object bean, final
>>>> String beanName) throws BeansException {
>>>>    if (beanName.equals(getTargetBeanName()) && bean instanceof Region)
>>>> {
>>>>           //get your data from where it lives and do a put or a put all
>>>> into the region here
>>>> ((Region) bean).put(<Key For Product>,<Product Value>);
>>>>    log.info("Preloading complete. Region now has: " + ((Region)
>>>> bean).size());
>>>>    }
>>>>    return bean;
>>>>  }
>>>>
>>>>
>>>>
>>>> }
>>>>
>>>>
>>>> On Sat, Jan 14, 2017 at 10:01 AM, Amit Pandey <
>>>> [email protected]> wrote:
>>>>
>>>>> Hey John,
>>>>>
>>>>> How do we hook up post processors for a region ?
>>>>>
>>>>> If I have a region like :-
>>>>>
>>>>> <gfe:partitioned-region id="trades">
>>>>>     <gfe:cache-loader>
>>>>>         <bean class="x.y.z.TradeLoader"/>
>>>>>     </gfe:cache-loader>
>>>>>     <gfe:cache-writer>
>>>>>         <bean class="x.y.z.TradeWriter"/>
>>>>>     </gfe:cache-writer>
>>>>>
>>>>>
>>>>> </gfe:partitioned-region>
>>>>>
>>>>>
>>>>> How do we hook up the post processor?
>>>>>
>>>>>
>>>>> On Tue, Dec 27, 2016 at 1:22 PM, Amit Pandey <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> Happy Holidays. Wishing you a great new year :)
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> On Tue, Dec 27, 2016 at 1:08 PM, John Blum <[email protected]> wrote:
>>>>>>
>>>>>>> ;-)  Happy holidays my friend.  Hope your are getting some good R&R.
>>>>>>>
>>>>>>> On Mon, Dec 26, 2016 at 2:14 PM, Udo Kohlmeyer <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> it helps a lot! :D
>>>>>>>>
>>>>>>>> On 12/26/16 12:28, John Blum wrote:
>>>>>>>>
>>>>>>>> Amit-
>>>>>>>>
>>>>>>>> Regarding...
>>>>>>>>
>>>>>>>> *> I want to load all data on cache startup at a go.*
>>>>>>>>
>>>>>>>> Since you are using "*Spring*", you could easily implement a
>>>>>>>> *Spring* BeanPostProcessor [1] (BPP) for each (or all the)
>>>>>>>> *Region(s)* in which you need to load data.  I do this frequently
>>>>>>>> in *Spring Data GemFire/Geode's* test suite when testing *Region*
>>>>>>>> data access operations using the GemfireTemplate, *Repositories*
>>>>>>>> or things of that nature.  Clearly your BPP could use a DataSource
>>>>>>>> to load the data from an external data store (e.g. RDBMS).
>>>>>>>>
>>>>>>>> Another way to do load data on startup is to use a Geode
>>>>>>>> *Initializer*.  However, this would require you to specify a
>>>>>>>> snippet of cache.xml and does not work if you specify your
>>>>>>>> *Regions* in *Spring* (XML/Java) config as you should when using
>>>>>>>> *Spring*.  I also don't recommend using cache.xml, but is the
>>>>>>>> pure, non-*Spring* way to invoke logic after the cache has been
>>>>>>>> "fully" initialized (i.e. where the *Regions* have been defined in
>>>>>>>> cache.xml).
>>>>>>>>
>>>>>>>> See here [2] for more details.  Note, the documentation talks of
>>>>>>>> "launching an application" on startup, after cache initialization, but
>>>>>>>> technically, you can do whatever you want, like load data.
>>>>>>>>
>>>>>>>> I recommend the BPP.
>>>>>>>>
>>>>>>>>
>>>>>>>> *> How should I set it up in config to allow it to join other nodes
>>>>>>>> in cluster?*
>>>>>>>>
>>>>>>>> Regardless of whether your server data node is "embedded" or not,
>>>>>>>> you can still use a Locator, or mcast to have the node join the 
>>>>>>>> cluster.
>>>>>>>> The "embedded" scenario, where the "application" is a GemFire Server 
>>>>>>>> data
>>>>>>>> node will be part of the cluster as Udo said.
>>>>>>>>
>>>>>>>> This is easily achievable with...
>>>>>>>>
>>>>>>>> <util:properties id="gemfireProperties">
>>>>>>>>   <prop key="name">Example</prop>
>>>>>>>>   <!-- Set to non-zero value to use Multicast; comment out
>>>>>>>> "locators" -->
>>>>>>>>   <prop key="*mcast-port*">0</prop>
>>>>>>>>   <prop key="log-level">${gemfire.log-level:config}</prop>
>>>>>>>>   <prop key=“*locators*”>someHost[10334]</prop>
>>>>>>>>   <prop key="start-locator">localhost[1034]</prop>
>>>>>>>> </util:properties>
>>>>>>>>
>>>>>>>> <gfe:cache properties-ref="gemfireProperties"/>
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>
>>>>>>>> As you can see from the snippet of *Spring* XML config above, this
>>>>>>>> application is a Geode "peer" cache (i.e. embeds a Geode data 
>>>>>>>> node/server).
>>>>>>>>
>>>>>>>> The "*locators*" Geode/GemFire property enables this node to
>>>>>>>> connect to a cluster.  Likewise, you can use the "*mcast-port*"
>>>>>>>> property instead, however, I would recommend *Locators* over mcast.
>>>>>>>>
>>>>>>>> Additionally, you can see that I specified the "start-locator"
>>>>>>>> Geode/GemFire property, which enables me to start an embedded Locator.
>>>>>>>> Useful for testing purposes and connecting Geode data nodes together 
>>>>>>>> in a
>>>>>>>> cluster without a dedicated Locator, though, this approach is less
>>>>>>>> resilient if the applications/servers go down (as may be the case in a
>>>>>>>> micro-services scenario)!
>>>>>>>>
>>>>>>>>
>>>>>>>> *> if I start with embedded server is it required to use client
>>>>>>>> pool or is it not required?*
>>>>>>>>
>>>>>>>> A "client pool" is only applicable to cache clients (i.e.
>>>>>>>> ClientCaches) on the "client-side" of the equation.  "peers" find
>>>>>>>> (Locator, mcast) and communicate (TCP/UDP, JGroups) with each other 
>>>>>>>> through
>>>>>>>> other means once a cluster is formed.
>>>>>>>>
>>>>>>>> In fact, typically, it is more common to position your
>>>>>>>> microservices-based applications as Geode cache clients (i.e. 
>>>>>>>> <gfe:client-cache
>>>>>>>> ...>) and have them connect to a dedicated Geode service (i.e.
>>>>>>>> cluster of Geode servers/data nodes where also, 1 or more of those 
>>>>>>>> nodes
>>>>>>>> are running a "CacheServer", listening for cache clients to
>>>>>>>> connect).  These dedicated Geode server nodes in a cluster 
>>>>>>>> constituting the
>>>>>>>> service can still be configured with *Spring*, but they typically
>>>>>>>> will not contain an application-specific components other than
>>>>>>>> CacheListeners, Loaders, Writers, AEQ *Listeners*, etc.
>>>>>>>>
>>>>>>>> ClientCache applications use 1 or more Pools configured to talk to
>>>>>>>> the servers in the cluster (either by way of Locator or direct server
>>>>>>>> communication). Pools can be configured with groups to target
>>>>>>>> specific members (in that group) in the cluster.  Typically, members 
>>>>>>>> in 1
>>>>>>>> group host a different set of Regions from another group and is a way 
>>>>>>>> to
>>>>>>>> separate data traffic from 1 client to another dedicated to a specific
>>>>>>>> resource/purpose (usually based on business function, etc).
>>>>>>>>
>>>>>>>> On a side note, some of what you are wanting to do "scale-wise"
>>>>>>>> seems like a perfect fit for Pivotal CloudFoundry, which can 
>>>>>>>> auto-scale up
>>>>>>>> or down nodes in your cluster based on load and other factors.
>>>>>>>>
>>>>>>>> Anyway, hope this helps!
>>>>>>>>
>>>>>>>> -John
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [1] http://docs.spring.io/spring/docs/current/spring-framewo
>>>>>>>> rk-reference/htmlsingle/#beans-factory-extension-bpp
>>>>>>>> [2] http://geode.apache.org/docs/guide/basic_config/the_cach
>>>>>>>> e/setting_cache_initializer.html
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Dec 25, 2016 at 11:12 PM, Amit Pandey <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hey,
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> I have lots of reference data which will be loaded at start of
>>>>>>>>> day. This data is not bound to change much and as such I want to keep 
>>>>>>>>> it
>>>>>>>>> loaded at the start of day. Read through will make it slow while it is
>>>>>>>>> being actually accessed so I want to keep it loaded in memory.
>>>>>>>>>
>>>>>>>>> Also I want to have functions which will be called by clients to
>>>>>>>>> do some compute and return results. Using functions should allow me 
>>>>>>>>> to add
>>>>>>>>> nodes and speed up the compute.
>>>>>>>>>
>>>>>>>>> I have some micro services each of which will start a gemfire
>>>>>>>>> node, and I want to connect, so yes I can set it up with locator.
>>>>>>>>>
>>>>>>>>> However I have one doubt, if I start with embedded server is it
>>>>>>>>> required to use client pool or is it not required?
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>> On Mon, Dec 26, 2016 at 1:18 AM, Udo Kohlmeyer <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi there Amit,
>>>>>>>>>>
>>>>>>>>>> At this stage the only way you could load all data at one go is
>>>>>>>>>> to write a client to connect to the db and load all in. Another 
>>>>>>>>>> approach
>>>>>>>>>> could be to write the same code into a function and invoke the 
>>>>>>>>>> function at
>>>>>>>>>> start up. But in both cases both are manual.
>>>>>>>>>>
>>>>>>>>>> To have geode servers join a cluster, you have 2 ways.
>>>>>>>>>>
>>>>>>>>>>    1. Connecting them up via a locator
>>>>>>>>>>    2. Connecting them up via mcast.
>>>>>>>>>>
>>>>>>>>>> Please be aware the once you connect a server to a cluster, that
>>>>>>>>>> server becomes an integral part of the cluster so adding/removing 
>>>>>>>>>> servers
>>>>>>>>>> from a cluster is not something you'd want to do in a load-based 
>>>>>>>>>> scaling
>>>>>>>>>> model. i.e if the load is high, add a server and if load is low, 
>>>>>>>>>> shut down
>>>>>>>>>> a server.
>>>>>>>>>>
>>>>>>>>>> Just interest sake, what is your use case.
>>>>>>>>>>
>>>>>>>>>> --Udo
>>>>>>>>>>
>>>>>>>>>> On 12/24/16 05:57, Amit Pandey wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Guys,
>>>>>>>>>>
>>>>>>>>>> I am using Spring Data Geode. I have been able to use read and
>>>>>>>>>> write through/ write behind. I want to load all data on cache 
>>>>>>>>>> startup at a
>>>>>>>>>> go.
>>>>>>>>>>
>>>>>>>>>> Secondly my geode server is embedded but I want to allow it join
>>>>>>>>>> to other nodes.  How should I set it up in config to allow it to 
>>>>>>>>>> join other
>>>>>>>>>> nodes in cluster?
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> -John
>>>>>>>> john.blum10101 (skype)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> -John
>>>>>>> john.blum10101 (skype)
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Luke Shannon | Platform Engineering | Pivotal
>>>> ------------------------------------------------------------
>>>> -------------
>>>>
>>>> Mobile:416-571-9495 <(416)%20571-9495>
>>>> Join the Toronto Pivotal Usergroup: http://www.meetup.c
>>>> om/Toronto-Pivotal-User-Group/
>>>>
>>>
>>>
>>
>>
>> --
>> -John
>> john.blum10101 (skype)
>>
>
>
>
> --
> -John
> john.blum10101 (skype)
>

Re: Load all data from DB on Cache Start

Reply via email to