Just wow, thank you all for the detailed and well thought out responses. I'm going to try and compile these responses into a document for future reference and will share back with the community. Thanks again.
On Friday, April 15, 2016, Barry Oglesby <[email protected]> wrote: > Executing queries in functions can be tricky. > > For executing queries in a function, do something like: > > - invoke the function with onRegion > - have the function return true from optimizeForWrite so that it is > executed only on primary buckets > - use the Query execute API with a RegionFunctionContext in the function. > Otherwise, you could easily end up executing the same query on more than > one member. > > If you set a filter, the function (and query) will execute on only the > member containing the primary or primaries for that filter. > > Here is an example with trades. > > If you route all trades on a specific cusip to the same bucket using a > PartitionResolver, then querying for all trades for a specific cusip can be > done efficiently using a Function. The trades could be stored with a simple > String key like cusip-id or a complex key containing both the cusip and id. > Either way, the PartitionResolver will need to be able to return the cusip > for the routing object. > > Invoke the function like: > > Execution execution = > FunctionService.onRegion(this.region).withFilter(Collections.singleton(cusip)); > ResultCollector collector = execution.execute("TradeQueryFunction"); > Object result = collector.getResult(); > > In the TradeQueryFunction, execute the query like: > > RegionFunctionContext rfc = (RegionFunctionContext) context; > String cusip = (String) rfc.getFilter().iterator().next(); > SelectResults results = (SelectResults) this.query.execute(rfc, new > String[] {cusip}); > > Where the query is: > > select * from /Trade where cusip = $1 > > This will route the function request to the member whose primary bucket > contains the cusip filter. Then it will execute the query on the > RegionFunctionContext which will just be the data for that bucket. Note: > the PartitionResolver will also need to be able to return the cusip for > that filter (which is just the input string itself). > > Here is a some more general info on functions. > > If you're executing a function onRegion with a replicated region, then the > function is executed on any member defining that region. Since the region > is replicated, every server has the same data. > > If you're executing a function onRegion with a partitioned region, then > where the function is invoked depends on the result of optimizeForWrite. If > optimizeForWrite returns true, the function is invoked on all the members > containing primary buckets for that region. If optimizeForWrite returns > false, the function is invoked on as few members as it can that encompass > all the buckets (so it mixes primary and secondary buckets). For example if > you have 2 members, and the primaries are split between them, then > optimizeForWrite returning true means that the function will be invoked on > both members. Returning false will cause the function to be invoked on only > one member since each member has all the buckets. I almost always have > optimizeForWrite return true. > > The onServer/onServers API is used for data-unaware calls (meaning no > specific region involved). In the past, I've used it mainly for admin-type > behavior like: > > - start/stop gateway senders > - create regions > - rebalance > - assign buckets > > Now, gfsh does a lot of this behavior (maybe all of it), so I don't > necessarily need functions to do it anymore. > > One of my favorite onServer use cases is the command pattern using a > Request/Response API like: > > - define a Request (like RebalanceCache)- > - pass it as an argument to a CommandFunction from the client to a server > using onServer > - execute it on the server > - return a Response > > One use case for invoking a function from another function is member > notification. This can be done with a CacheListener on a replicated region > too, but the basic idea is: > > - invoke a function > - in the function, invoke another function on all the members notifying > them something is about to happen > - do the thing > - invoke another function on all the members notifying them something has > happened > > You need to be careful when invoking one function from another. Depending > on what you're doing in the second function, you could get yourself into a > distributed deadlock situation. > > I'm not sure this answers all the issues you were seeing, but hopefully it > helps. > > Thanks, > Barry Oglesby > > > On Fri, Apr 15, 2016 at 1:36 PM, Matt Ross <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> Hi all, >> >> I'm involved in a sizable GemFire Project right now that is requiring me >> to execute Functions in a number of ways, and I wanted to poll the >> community for some best practices. So initially I would execute all >> functions like this. >> >> ResultCollector<?, ?> rc = FunctionService.onRegion(region) >> .withArgs(arguments).execute("my-awesome-function"); >> >> And this worked reliably for quite some time, until I started mixing up >> functions that were executing on partition redundant data and replicated >> data. I initially started having problems with this method when I had this >> setup. >> >> 1 locator, 2 servers, and executing functions that would run queries on >> partition redundant and replicated regions. I started getting this problem >> where the function would execute on both servers, and the result collector >> would indeterminately chose a server to return results from. According to >> logging statements placed within my function I was able to confirm that the >> function was being executed twice, on both servers. We were able to fix >> this problem by switching from executing on region, to executing on Pool. >> The initial logic being since there was replicated data on both servers, the >> function would execute on both servers(Hyptothesis). >> >> Another issue was executing functions from within a function without a >> function context. Let's say I have one function that I execute with on >> Pool, there for it is passed a Function Context. But when I'm actually in >> the function I need to execute other functions, some needing a >> RegionFunctionContext and some just needing a FunctionContext. Initially I >> was able to just use a Result Collector and FunctionService.onRegion to get >> a region context, and then pass my current function context to an instance >> of a new function >> >> MyAwesomeFunction myAwesomeFunction= MyAwesomeFunction(); >> >> myAweSomeFunction.execute(functionContext); >> >> This worked for a time but complexity started rising and more problems came >> up. >> >> So in short I wanted to throw out the blanket question of best practices on >> using (onRegion/onPool/onServer), calling other functions from within >> functions, what type of functions should be used on what type of regions, >> and general design patterns when executing functions. Thanks! >> >> *Matthew Ross | Data Engineer | Pivotal* >> *625 Avenue of the Americas NY, NY 10011* >> *516-941-7535 <516-941-7535> | [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');> * >> >> > -- *Matthew Ross | Data Engineer | Pivotal* *625 Avenue of the Americas NY, NY 10011* *516-941-7535 <516-941-7535> | [email protected] <[email protected]> *
