Executing queries in functions can be tricky.
For executing queries in a function, do something like:
- invoke the function with onRegion
- have the function return true from optimizeForWrite so that it is
executed only on primary buckets
- use the Query execute API with a RegionFunctionContext in the function.
Otherwise, you could easily end up executing the same query on more than
one member.
If you set a filter, the function (and query) will execute on only the
member containing the primary or primaries for that filter.
Here is an example with trades.
If you route all trades on a specific cusip to the same bucket using a
PartitionResolver, then querying for all trades for a specific cusip can be
done efficiently using a Function. The trades could be stored with a simple
String key like cusip-id or a complex key containing both the cusip and id.
Either way, the PartitionResolver will need to be able to return the cusip
for the routing object.
Invoke the function like:
Execution execution =
FunctionService.onRegion(this.region).withFilter(Collections.singleton(cusip));
ResultCollector collector = execution.execute("TradeQueryFunction");
Object result = collector.getResult();
In the TradeQueryFunction, execute the query like:
RegionFunctionContext rfc = (RegionFunctionContext) context;
String cusip = (String) rfc.getFilter().iterator().next();
SelectResults results = (SelectResults) this.query.execute(rfc, new
String[] {cusip});
Where the query is:
select * from /Trade where cusip = $1
This will route the function request to the member whose primary bucket
contains the cusip filter. Then it will execute the query on the
RegionFunctionContext which will just be the data for that bucket. Note:
the PartitionResolver will also need to be able to return the cusip for
that filter (which is just the input string itself).
Here is a some more general info on functions.
If you're executing a function onRegion with a replicated region, then the
function is executed on any member defining that region. Since the region
is replicated, every server has the same data.
If you're executing a function onRegion with a partitioned region, then
where the function is invoked depends on the result of optimizeForWrite. If
optimizeForWrite returns true, the function is invoked on all the members
containing primary buckets for that region. If optimizeForWrite returns
false, the function is invoked on as few members as it can that encompass
all the buckets (so it mixes primary and secondary buckets). For example if
you have 2 members, and the primaries are split between them, then
optimizeForWrite returning true means that the function will be invoked on
both members. Returning false will cause the function to be invoked on only
one member since each member has all the buckets. I almost always have
optimizeForWrite return true.
The onServer/onServers API is used for data-unaware calls (meaning no
specific region involved). In the past, I've used it mainly for admin-type
behavior like:
- start/stop gateway senders
- create regions
- rebalance
- assign buckets
Now, gfsh does a lot of this behavior (maybe all of it), so I don't
necessarily need functions to do it anymore.
One of my favorite onServer use cases is the command pattern using a
Request/Response API like:
- define a Request (like RebalanceCache)-
- pass it as an argument to a CommandFunction from the client to a server
using onServer
- execute it on the server
- return a Response
One use case for invoking a function from another function is member
notification. This can be done with a CacheListener on a replicated region
too, but the basic idea is:
- invoke a function
- in the function, invoke another function on all the members notifying
them something is about to happen
- do the thing
- invoke another function on all the members notifying them something has
happened
You need to be careful when invoking one function from another. Depending
on what you're doing in the second function, you could get yourself into a
distributed deadlock situation.
I'm not sure this answers all the issues you were seeing, but hopefully it
helps.
Thanks,
Barry Oglesby
On Fri, Apr 15, 2016 at 1:36 PM, Matt Ross <[email protected]> wrote:
> Hi all,
>
> I'm involved in a sizable GemFire Project right now that is requiring me
> to execute Functions in a number of ways, and I wanted to poll the
> community for some best practices. So initially I would execute all
> functions like this.
>
> ResultCollector<?, ?> rc = FunctionService.onRegion(region)
> .withArgs(arguments).execute("my-awesome-function");
>
> And this worked reliably for quite some time, until I started mixing up
> functions that were executing on partition redundant data and replicated
> data. I initially started having problems with this method when I had this
> setup.
>
> 1 locator, 2 servers, and executing functions that would run queries on
> partition redundant and replicated regions. I started getting this problem
> where the function would execute on both servers, and the result collector
> would indeterminately chose a server to return results from. According to
> logging statements placed within my function I was able to confirm that the
> function was being executed twice, on both servers. We were able to fix this
> problem by switching from executing on region, to executing on Pool. The
> initial logic being since there was replicated data on both servers, the
> function would execute on both servers(Hyptothesis).
>
> Another issue was executing functions from within a function without a
> function context. Let's say I have one function that I execute with on Pool,
> there for it is passed a Function Context. But when I'm actually in the
> function I need to execute other functions, some needing a
> RegionFunctionContext and some just needing a FunctionContext. Initially I
> was able to just use a Result Collector and FunctionService.onRegion to get a
> region context, and then pass my current function context to an instance of a
> new function
>
> MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction();
>
> myAweSomeFunction.execute(functionContext);
>
> This worked for a time but complexity started rising and more problems came
> up.
>
> So in short I wanted to throw out the blanket question of best practices on
> using (onRegion/onPool/onServer), calling other functions from within
> functions, what type of functions should be used on what type of regions, and
> general design patterns when executing functions. Thanks!
>
> *Matthew Ross | Data Engineer | Pivotal*
> *625 Avenue of the Americas NY, NY 10011*
> *516-941-7535 <516-941-7535> | [email protected] <[email protected]> *
>
>