Hello!

My team uses Geode as a makeshift analytics engine. We store a collection of 
massive raw data objects (200MB+ each) in Geode, but these objects are never 
directly returned to the client. Instead, we rely heavily on custom function 
execution to process these data sets inside Geode, and only return the analysis 
result set.

We have a new requirement to implement two tiers of data analytics precision. 
The high-precision analytics will require larger raw data sets and more CPU 
time. It is imperative that these high-precision analyses do not inhibit the 
low-precision analytics performance in any way. As such, I'm looking for a 
solution that keeps these data sets isolated to different servers.

I built a POC that keeps each data set in its own region (both are 
PARTITIONED). These regions are configured to belong to separate Member Groups, 
then each server is configured to join one of the two groups. I'm able to stand 
up this cluster locally without issue, and gfsh indicates that everything looks 
correct: `describe member` shows each member hosting the expected regions.

My client code configures a ClientCache that points at the cluster's single 
locator. My function execution command generally looks like the following:

FunctionService
  .onRegion(highPrecisionRegion)
  .setArguments(inputObject)
  .filter(keySet)
  .execute(function);

When I only run the high-precision server, I'm able to execute the function 
against the high-precision region. When I only run the low-precision server, 
I'm able to execute the function against the low-precision region. However, 
when I run both servers and execute the functions one after the other, I 
invariably get an exception stating that *one* of the regions cannot be found. 
See the following Gist for a sample of my code and the exception.
https://gist.github.com/dLoewy/c9f695d67f77ec18a7e60a25c4e62b01

TLDR key points:
1) Using member groups, Region A is on Server 1 and Region B is on Server 2.
2) These regions must be PARTITIONED in Production.
3) I need to run a *data-dependent* function on one of these regions; The 
client code chooses which.
4) As-is, my client code always fails to find *one* of the regions.

Can someone please help me get on track? Is there an entirely different cluster 
architecture I should be considering? Happy to provide more detail upon request.

Thanks so much for your time!

David

FYI, the following docs pages mention function execution on Member Groups, but 
give very little detail. The first link describes running data-INdependent 
functions on member groups, but doesn't say how, and doesn't say anything about 
running data-DEpendent functions on member groups.
https://gemfire.docs.pivotal.io/99/geode/developing/function_exec/how_function_execution_works.html
https://gemfire.docs.pivotal.io/99/geode/developing/function_exec/function_execution.html

Reply via email to