Hi all, I want to create index for a bunch of log. Out log is line based text file. Each line contains time, uid and some other data. I want to create index for uid. So for each line, mapper will emit a <uid,offset> pair. And the result is sorted by uid. The reducer will combine all the offset for a single uid. The final result is put into database. So sorting by uid is not necessary for me.
Is there any way to turn off the sorting in hadoop? Thanks.