Hi, I was working with a small Hadoop cluster while I was developing a new scheduler, however the cluster was used only for development purposes and never in production so I am wondering what obstacles are you facing in a typical day-to-day cluster administration?
We have been discussing with an ad-company (which has their own development team) about building a platform with hbase, hadoop and maybe some in-memory database for caching. My part would be to establish a small cluster (~ 5nodes) that would satisfy their requirements and to monitor its behavior. Because of my current job probably I will not be available at their site for full-time, so I am wondering: a) What things are taking most of your time in cluster administration? b) How many hours should I plan to administer the cluster when the infrastructure and data is ready (probably this will be a long process) ... c) What tasks besides software updates, schema updates, monitoring, additional provisioning should I plan ? Thank you...