Re: Best Practices for Calling Server Side Functions

Barry Oglesby Mon, 18 Apr 2016 10:12:12 -0700

Interesting. My simple replicate test is running just fine. What exception
are you seeing?


Thanks,
Barry Oglesby


On Mon, Apr 18, 2016 at 7:29 AM, Mark Secrist <[email protected]> wrote:

> Great detailed explanation Barry. At on point, you made this statement:
>
> "If you're executing a function onRegion with a replicated region, then
> the function is executed on any member defining that region. Since the
> region is replicated, every server has the same data."
>
> Has something changed with Geode recently? Last I checked you can't
> execute onRegion on a Replicated region. You'll actually get an exception
> thrown.
>
> On Fri, Apr 15, 2016 at 5:53 PM, Barry Oglesby <[email protected]>
> wrote:
>
>> Executing queries in functions can be tricky.
>>
>> For executing queries in a function, do something like:
>>
>> - invoke the function with onRegion
>> - have the function return true from optimizeForWrite so that it is
>> executed only on primary buckets
>> - use the Query execute API with a RegionFunctionContext in the function.
>> Otherwise, you could easily end up executing the same query on more than
>> one member.
>>
>> If you set a filter, the function (and query) will execute on only the
>> member containing the primary or primaries for that filter.
>>
>> Here is an example with trades.
>>
>> If you route all trades on a specific cusip to the same bucket using a
>> PartitionResolver, then querying for all trades for a specific cusip can be
>> done efficiently using a Function. The trades could be stored with a simple
>> String key like cusip-id or a complex key containing both the cusip and id.
>> Either way, the PartitionResolver will need to be able to return the cusip
>> for the routing object.
>>
>> Invoke the function like:
>>
>> Execution execution =
>> FunctionService.onRegion(this.region).withFilter(Collections.singleton(cusip));
>> ResultCollector collector = execution.execute("TradeQueryFunction");
>> Object result = collector.getResult();
>>
>> In the TradeQueryFunction, execute the query like:
>>
>> RegionFunctionContext rfc = (RegionFunctionContext) context;
>> String cusip = (String) rfc.getFilter().iterator().next();
>> SelectResults results = (SelectResults) this.query.execute(rfc, new
>> String[] {cusip});
>>
>> Where the query is:
>>
>> select * from /Trade where cusip = $1
>>
>> This will route the function request to the member whose primary bucket
>> contains the cusip filter. Then it will execute the query on the
>> RegionFunctionContext which will just be the data for that bucket. Note:
>> the PartitionResolver will also need to be able to return the cusip for
>> that filter (which is just the input string itself).
>>
>> Here is a some more general info on functions.
>>
>> If you're executing a function onRegion with a replicated region, then
>> the function is executed on any member defining that region. Since the
>> region is replicated, every server has the same data.
>>
>> If you're executing a function onRegion with a partitioned region, then
>> where the function is invoked depends on the result of optimizeForWrite. If
>> optimizeForWrite returns true, the function is invoked on all the members
>> containing primary buckets for that region. If optimizeForWrite returns
>> false, the function is invoked on as few members as it can that encompass
>> all the buckets (so it mixes primary and secondary buckets). For example if
>> you have 2 members, and the primaries are split between them, then
>> optimizeForWrite returning true means that the function will be invoked on
>> both members. Returning false will cause the function to be invoked on only
>> one member since each member has all the buckets. I almost always have
>> optimizeForWrite return true.
>>
>> The onServer/onServers API is used for data-unaware calls (meaning no
>> specific region involved). In the past, I've used it mainly for admin-type
>> behavior like:
>>
>> - start/stop gateway senders
>> - create regions
>> - rebalance
>> - assign buckets
>>
>> Now, gfsh does a lot of this behavior (maybe all of it), so I don't
>> necessarily need functions to do it anymore.
>>
>> One of my favorite onServer use cases is the command pattern using a
>> Request/Response API like:
>>
>> - define a Request (like RebalanceCache)-
>> - pass it as an argument to a CommandFunction from the client to a server
>> using onServer
>> - execute it on the server
>> - return a Response
>>
>> One use case for invoking a function from another function is member
>> notification. This can be done with a CacheListener on a replicated region
>> too, but the basic idea is:
>>
>> - invoke a function
>> - in the function, invoke another function on all the members notifying
>> them something is about to happen
>> - do the thing
>> - invoke another function on all the members notifying them something has
>> happened
>>
>> You need to be careful when invoking one function from another. Depending
>> on what you're doing in the second function, you could get yourself into a
>> distributed deadlock situation.
>>
>> I'm not sure this answers all the issues you were seeing, but hopefully
>> it helps.
>>
>> Thanks,
>> Barry Oglesby
>>
>>
>> On Fri, Apr 15, 2016 at 1:36 PM, Matt Ross <[email protected]> wrote:
>>
>>> Hi all,
>>>
>>> I'm involved in a sizable GemFire Project right now that is requiring me
>>> to execute Functions in a number of ways, and I wanted to poll the
>>> community for some best practices.  So initially I would execute all
>>> functions like this.
>>>
>>> ResultCollector<?, ?> rc = FunctionService.onRegion(region)
>>>     .withArgs(arguments).execute("my-awesome-function");
>>>
>>> And this worked reliably for quite some time, until I started mixing up 
>>> functions that were executing on partition redundant data and replicated 
>>> data.  I initially started having problems with this method when I had this 
>>> setup.
>>>
>>> 1 locator, 2 servers,  and executing functions that would run queries on 
>>> partition redundant and replicated regions.  I started getting this problem 
>>> where the function would execute on both servers, and the result collector 
>>> would indeterminately chose a server to return results from.  According to 
>>> logging statements placed within my function I was able to confirm that the 
>>> function was being executed twice, on both servers.  We were able to fix 
>>> this problem by switching from executing on region, to executing on Pool.  
>>> The initial logic being since there was replicated data on both servers, 
>>> the function would execute on both servers(Hyptothesis).
>>>
>>> Another issue was executing functions from within a function without a 
>>> function context.  Let's say I have one function that I execute with on 
>>> Pool, there for it is passed a Function Context.  But when I'm actually in 
>>> the function I need to execute other functions, some needing a 
>>> RegionFunctionContext and some just needing a FunctionContext.  Initially I 
>>> was able to just use a Result Collector and FunctionService.onRegion to get 
>>> a region context, and then pass my current function context to an instance 
>>> of a new function
>>>
>>> MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction();
>>>
>>> myAweSomeFunction.execute(functionContext);
>>>
>>> This worked for a time but complexity started rising and more problems came 
>>> up.
>>>
>>> So in short I wanted to throw out the blanket question of best practices on 
>>> using (onRegion/onPool/onServer), calling other functions from within 
>>> functions, what type of functions should be used on what type of regions, 
>>> and general design patterns when executing functions.  Thanks!
>>>
>>> *Matthew Ross | Data Engineer | Pivotal*
>>> *625 Avenue of the Americas NY, NY 10011*
>>> *516-941-7535 <516-941-7535> | [email protected] <[email protected]> *
>>>
>>>
>>
>
>
> --
>
> *Mark Secrist | Sr Manager, **Global Education Delivery*
>
> [email protected]
>
> 970.214.4567 Mobile
>
>   *pivotal.io <http://www.pivotal.io/>*
>
> Follow Us: Twitter <http://www.twitter.com/pivotal> | LinkedIn
> <http://www.linkedin.com/company/pivotalsoftware> | Facebook
> <http://www.facebook.com/pivotalsoftware> | YouTube
> <http://www.youtube.com/gopivotal> | Google+
> <https://plus.google.com/105320112436428794490>
>

Re: Best Practices for Calling Server Side Functions

Reply via email to