Thanks Bejoy, that makes sense .
If I want to know the different record's original file, I need to
put an extra file id into the mapper's output value, then get it in the
reducer .
Do you have any other ideas
Thanks!.
On Tue, Mar 20, 2012 at 6:09 PM,Bejoy Ks
Yes, if you are having more than 2 files to be compared against then, the
file name/ id is required from mapper. If it is just two files and you
just want to know which lines are not unique then just the line no would be
good but if you are looking at more granular info like the exact changes in
Thanks a lot!
On Tue, Mar 20, 2012 at 7:13,Bejoy Ks bejoy.had...@gmail.com wrote:
Yes, if you are having more than 2 files to be compared against then, the
file name/ id is required from mapper. If it is just two files and you
just want to know which lines are not unique then just the line
You are right, Dieter. The linux diff regards a file as a list, but I
only want to treat it as a set. Sorry I did't make it clear at begining .
On Tue, Mar 20, 2012 at 7:33 PM,Dieter Plaetinck die...@plaetinck.be
wrote:
the diff command on linux (i.e. gnu diffutils) is way more involved than