I am looking to evaluate Ignite/GridGain to turn an iterative computation batch job to a user-facing hot request-response app. as a general question, has this type of thing been attempted before? more specifically, and this may still be too vague, what modifiable parameters (cluster config, partitioning, data loading, eviction policy setting, etc) do you envision to be most paramount to get right to enable this?
here is a detailed toy example to clearly illustrate. imagine we currently have a two phase recommender system. the first phase (typical ML recommender algorithms) pair down an entire repository (10e6 to 10e7 objects) of entities (movies, songs, readables, etc) to a less huge list of likely candidates (10e4-10e5 objects) for each user/group of users. the second phase, currently, produces a list of 10 recommendations by iteratively assigning a score to every object in the candidate list and selecting top score. to assign a score, some info about the users behavior over the last week is gathered as variables to apply to the iterative algorithm. both phases are Spark jobs. the algorithms for the score assigning and iterating are elegantly expressed with Spark's abstractions. Now, however, we want the second phase to be an on-demand service that backs a user app. instead of gleaning info about the user behind the scenes with no real time limit to complete tasks, the user can interact with the algorithm directly. "heres my mood score, heres my last read book, i want a list of 5 books, Go". we would need sub-second latency for the algorithm to score and select through the list of 100,000 or so items. thoughts so far: translate Spark map transformation, map reduce algorithm to a fork-join(use cores!). partition cluster so users distributed evenly. thanks!
