Memory constraints of those machines prevent us from being able to load
two models at the same time.
On 11/8/11 10:10 PM, Ted Dunning wrote:
Yes. This definitely could be done with ZK.
See chapter 16 of Mahout in Action for an example of how to manage this for
a farm of classifiers which have very similar issues (although loading a
new model is much faster).
One trick that might work is to load the new model before dropping the old
one. You might be able to do a very fast handover that way.
On Tue, Nov 8, 2011 at 12:18 PM, Mark<[email protected]> wrote:
I have a general design question regarding ZooKeeper.
Our use case: We currently have 3 restful recommendation servers that
simply wrap a Mahout GenericBooleanPrefItemBasedRec**ommender. We started
off using a JDBCDataModel but for performance reasons we had to switch to a
FileDataModel so everything would be kept in memory. Although now that our
recommendations service is blazing fast the start up/reloading time for
each of these services are in the minutes. If we try to update all services
at once then all recommendation requests come to a halt. As a result of
this whenever we push a new model we have to do it in stages... ie disable
server1, update, wait, renable, disable server2.... We've "automated" this
using cron by simply updating one server waiting 10 mins then updating the
next and so on. We are trying to figure out if this coordination would be
better managed via ZooKeeper.
I've read a bit into ZooKeeper and it seems like it would be easy to set a
watch on a node to trigger when a model has changed thus triggering a
refresh of our recommender. Where I get lost is how would I coordinate this
so only one server at a time goes down? When it comes back up then the next
server should be updated. Can someone please explain how this could be
accomplished? Thanks