How to handle multiline record for inputsplit?

2013-05-20 Thread Darpan R
Hi folks, I have a huge text file in TBs and it has multiline records. And we are not given that each records takes how many lines. One records can be of size 5 lines, other may be of 6 lines another may be 4 lines. Its not sure. Line size may vary for each record. Since we cannot use default TextI

Confusion related to NLineInputFormat

2013-05-06 Thread Darpan R
Hi guys, I've a confusion related to NLineInputFormat. I have written MR job using NLineInputFormat ,output I am getting fine. But I am getting only 2 Map jobs running. According to documentation of NLineInputFormat : If you want your mappers to receive a fixed number of lines of input, then NLi

Why number of reducers should be less than number of reducer slots.

2013-04-22 Thread Darpan R
Hi guys, I read somewhere that for better performance For maximum performance, the number of reducers should be slightly less than the number of reduce slots in the cluster. This allows the reducers to finish in one wave and fully utilizes the cluster during the reduce phase. I don't quite under