Re: Best Practices for Calling Server Side Functions

Matt Ross Fri, 15 Apr 2016 17:07:55 -0700

Just wow, thank you all for the detailed and well thought out responses.
I'm going to try and compile these responses into a document for future
reference and will share back with the community.  Thanks again.


On Friday, April 15, 2016, Barry Oglesby <[email protected]> wrote:

> Executing queries in functions can be tricky.
>
> For executing queries in a function, do something like:
>
> - invoke the function with onRegion
> - have the function return true from optimizeForWrite so that it is
> executed only on primary buckets
> - use the Query execute API with a RegionFunctionContext in the function.
> Otherwise, you could easily end up executing the same query on more than
> one member.
>
> If you set a filter, the function (and query) will execute on only the
> member containing the primary or primaries for that filter.
>
> Here is an example with trades.
>
> If you route all trades on a specific cusip to the same bucket using a
> PartitionResolver, then querying for all trades for a specific cusip can be
> done efficiently using a Function. The trades could be stored with a simple
> String key like cusip-id or a complex key containing both the cusip and id.
> Either way, the PartitionResolver will need to be able to return the cusip
> for the routing object.
>
> Invoke the function like:
>
> Execution execution =
> FunctionService.onRegion(this.region).withFilter(Collections.singleton(cusip));
> ResultCollector collector = execution.execute("TradeQueryFunction");
> Object result = collector.getResult();
>
> In the TradeQueryFunction, execute the query like:
>
> RegionFunctionContext rfc = (RegionFunctionContext) context;
> String cusip = (String) rfc.getFilter().iterator().next();
> SelectResults results = (SelectResults) this.query.execute(rfc, new
> String[] {cusip});
>
> Where the query is:
>
> select * from /Trade where cusip = $1
>
> This will route the function request to the member whose primary bucket
> contains the cusip filter. Then it will execute the query on the
> RegionFunctionContext which will just be the data for that bucket. Note:
> the PartitionResolver will also need to be able to return the cusip for
> that filter (which is just the input string itself).
>
> Here is a some more general info on functions.
>
> If you're executing a function onRegion with a replicated region, then the
> function is executed on any member defining that region. Since the region
> is replicated, every server has the same data.
>
> If you're executing a function onRegion with a partitioned region, then
> where the function is invoked depends on the result of optimizeForWrite. If
> optimizeForWrite returns true, the function is invoked on all the members
> containing primary buckets for that region. If optimizeForWrite returns
> false, the function is invoked on as few members as it can that encompass
> all the buckets (so it mixes primary and secondary buckets). For example if
> you have 2 members, and the primaries are split between them, then
> optimizeForWrite returning true means that the function will be invoked on
> both members. Returning false will cause the function to be invoked on only
> one member since each member has all the buckets. I almost always have
> optimizeForWrite return true.
>
> The onServer/onServers API is used for data-unaware calls (meaning no
> specific region involved). In the past, I've used it mainly for admin-type
> behavior like:
>
> - start/stop gateway senders
> - create regions
> - rebalance
> - assign buckets
>
> Now, gfsh does a lot of this behavior (maybe all of it), so I don't
> necessarily need functions to do it anymore.
>
> One of my favorite onServer use cases is the command pattern using a
> Request/Response API like:
>
> - define a Request (like RebalanceCache)-
> - pass it as an argument to a CommandFunction from the client to a server
> using onServer
> - execute it on the server
> - return a Response
>
> One use case for invoking a function from another function is member
> notification. This can be done with a CacheListener on a replicated region
> too, but the basic idea is:
>
> - invoke a function
> - in the function, invoke another function on all the members notifying
> them something is about to happen
> - do the thing
> - invoke another function on all the members notifying them something has
> happened
>
> You need to be careful when invoking one function from another. Depending
> on what you're doing in the second function, you could get yourself into a
> distributed deadlock situation.
>
> I'm not sure this answers all the issues you were seeing, but hopefully it
> helps.
>
> Thanks,
> Barry Oglesby
>
>
> On Fri, Apr 15, 2016 at 1:36 PM, Matt Ross <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>> Hi all,
>>
>> I'm involved in a sizable GemFire Project right now that is requiring me
>> to execute Functions in a number of ways, and I wanted to poll the
>> community for some best practices.  So initially I would execute all
>> functions like this.
>>
>> ResultCollector<?, ?> rc = FunctionService.onRegion(region)
>>     .withArgs(arguments).execute("my-awesome-function");
>>
>> And this worked reliably for quite some time, until I started mixing up 
>> functions that were executing on partition redundant data and replicated 
>> data.  I initially started having problems with this method when I had this 
>> setup.
>>
>> 1 locator, 2 servers,  and executing functions that would run queries on 
>> partition redundant and replicated regions.  I started getting this problem 
>> where the function would execute on both servers, and the result collector 
>> would indeterminately chose a server to return results from.  According to 
>> logging statements placed within my function I was able to confirm that the 
>> function was being executed twice, on both servers.  We were able to fix 
>> this problem by switching from executing on region, to executing on Pool.  
>> The initial logic being since there was replicated data on both servers, the 
>> function would execute on both servers(Hyptothesis).
>>
>> Another issue was executing functions from within a function without a 
>> function context.  Let's say I have one function that I execute with on 
>> Pool, there for it is passed a Function Context.  But when I'm actually in 
>> the function I need to execute other functions, some needing a 
>> RegionFunctionContext and some just needing a FunctionContext.  Initially I 
>> was able to just use a Result Collector and FunctionService.onRegion to get 
>> a region context, and then pass my current function context to an instance 
>> of a new function
>>
>> MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction();
>>
>> myAweSomeFunction.execute(functionContext);
>>
>> This worked for a time but complexity started rising and more problems came 
>> up.
>>
>> So in short I wanted to throw out the blanket question of best practices on 
>> using (onRegion/onPool/onServer), calling other functions from within 
>> functions, what type of functions should be used on what type of regions, 
>> and general design patterns when executing functions.  Thanks!
>>
>> *Matthew Ross | Data Engineer | Pivotal*
>> *625 Avenue of the Americas NY, NY 10011*
>> *516-941-7535 <516-941-7535> | [email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');> *
>>
>>
>

-- 
*Matthew Ross | Data Engineer | Pivotal*
*625 Avenue of the Americas NY, NY 10011*
*516-941-7535 <516-941-7535> | [email protected] <[email protected]> *

Re: Best Practices for Calling Server Side Functions

Reply via email to