RE: merging files

2009-03-18 Thread Nir Zohar
From: Enis Soztutar [mailto:enis@gmail.com] Sent: Wednesday, March 18, 2009 3:07 PM To: core-user@hadoop.apache.org Subject: Re: merging files Use MultipleInputs and use two different mappers for the inputs. map1 should be IdentityMapper, mapper 2 should output key, value pairs where value is

Re: merging files

2009-03-18 Thread Enis Soztutar
Use MultipleInputs and use two different mappers for the inputs. map1 should be IdentityMapper, mapper 2 should output key, value pairs where value is a peudo marker value(same for all keys), which marks that the value is null/empty. In the reducer just output the key/value pairs which does not

Re: merging files

2009-03-18 Thread Rasit OZDAS
I would use DistributedCache. Put file2 to distributed cache, but you should read it for every map. If you find a better solution, please let me know, because I have a similar issue. Rasit 2009/3/18 Nir Zohar > Hi, > > > > I would like your help with the below question. > > I have 2 files: file

merging files

2009-03-18 Thread Nir Zohar
Hi, I would like your help with the below question. I have 2 files: file1 (key, value), file2 (only key) and I need to exclude all records from file1 that these key records not in file2. 1. The output format is key-value, not only keys. 2. The key is not primary key; hence it's not possible