I'm seeing something unusual here and I wanted to see if it has occurred for any other HBase 0.90 users. I've read several emails here that recommend NOT using multi-threading in an MR job, so that's certainly under consideration. If anyone could add to their experiences with multi-threading in an MR job it would be very helpful. We are testing both implementations (with threading and without), but the threaded solution is causing the problem.
We are processing log files with PUTs in the Map and a followup incrementColumnValue() to a separate "counts" table in the Reducer. The reduce phase uses multi-threading. The Reducer initializes an HTablePool in the setup(), starts threads in the reduce() (to a Java BlockingQueue/CompletionService) which do the incrementColumnValue() and depending on the value returned create a PUT in the "counter" table, and in the cleanup() performs a completionService.take() which is ignored and flushes the PUTs queued by the threads. There are no issues for approximately the first 100GB of data inserted. After approximately 100GB however, every subsequent job has a freeze during the Reduce phase. What I see happening is at some point the Reduce (where the incrementColumnValue() takes place) tasks are "hung" and eventually killed with reason: task client has not responded for 600 seconds. The counters in the reduce job seem to grow briefly but then all the tasks' counter stop increasing and the task is eventually killed. Oddly, the problem does not occur if compaction is completely disabled (not just major, but also setting hbase.hstore.compactionThreshold = 9999999 and hbase.hstore.blockingStoreFiles = 9999999). Could there be a bug with HTablePool for large datasets and compaction? Again, this works as expected for approximately the first 100 jobs (1GB each) but consistently fails after that. Also to repeat, the problem does not occur with ALL compaction disabled. Difficult problem to describe, but I'm hoping someone may have some feedback and/or similar experiences. I can provide code examples if anyone is curious. Neil Yalowitz