Re: how to write this MapReduce

2009-10-26 Thread Anty
@Thomas Thanks.My input files are sorted . @Jingkei Thanks.I will have a look at the instructions for join. On Tue, Oct 27, 2009 at 12:39 AM, Thomas Thevis wrote: > Hey Anty, > > there exists a config key 'map.input.file' which should return the name of > the input file the mapper gets its input

Re: Map output compression leads to JVM crash (0.20.0)

2009-10-26 Thread Ed Mazur
Err, disregard that. $ cat /proc/version Linux version 2.6.9-89.0.9.plus.c4smp (mockbu...@builder10.centos.org) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-11)) #1 SMP Mon Aug 24 09:06:26 EDT 2009 Ed On Mon, Oct 26, 2009 at 3:23 PM, Ed Mazur wrote: > $ cat /etc/*-release > CentOS release 4.5 (Fi

Re: Map output compression leads to JVM crash (0.20.0)

2009-10-26 Thread Ed Mazur
$ cat /etc/*-release CentOS release 4.5 (Final) Rocks release 4.3 (Mars Hill) Ed On Mon, Oct 26, 2009 at 11:21 AM, Todd Lipcon wrote: > What Linux distro are you running? It seems vaguely possible that you're > using some incompatible library versions compared to what everyone else has > tested

Re: Question regarding wordCount example

2009-10-26 Thread felix gao
Thank you Jeff. Will try that later today. On Sun, Oct 25, 2009 at 10:56 PM, Jeff Zhang wrote: > Hi gao, > > You did not provider the type of key and value explicitly in your code, so > you have to write your map method as > > public void map(Object key, Object value, OutputCollector output, >

Re: DefaultCodec vs. LZO

2009-10-26 Thread Todd Lipcon
On Fri, Oct 23, 2009 at 2:32 PM, Hong Tang wrote: > You mean this: > http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=17? > > That and I think a couple others Kevin has found - I'll let him jump in with details. > Based on the description, the bug affects LzopCompressor, and sho

Re: how to write this MapReduce

2009-10-26 Thread Thomas Thevis
Hey Anty, there exists a config key 'map.input.file' which should return the name of the input file the mapper gets its input values from. In the pre-hadoop-0.20.0 era, one would have to implement the configure() method to have access to the configuration. Since then, it could be possible to u

Re: how to write this MapReduce

2009-10-26 Thread Anty
Thanks very much for your reply Thomas. I search in Mapper.map() method,but i still can't find out the way to retrieve the source file name of the input data,can you describe in more details? for your proposed suggestion,i have some doubts, the names of the three files are random,so we couldn't so

Re: how to write this MapReduce

2009-10-26 Thread Anty
Thanks very much for your reply Thomas. I search in Mapper.map() method,but i still can't find out the way to retrieve the source file name of the input data,can you describe in more details? for your proposed suggestion,i have some doubts, the names of the three files are random,so we couldn't so

Re: Map output compression leads to JVM crash (0.20.0)

2009-10-26 Thread Todd Lipcon
What Linux distro are you running? It seems vaguely possible that you're using some incompatible library versions compared to what everyone else has tested libhadoop with. -Todd On Sun, Oct 25, 2009 at 8:36 PM, Ed Mazur wrote: > I'm having problems on 0.20.0 when map output compression is enabl

Re: how to write this MapReduce

2009-10-26 Thread Jingkei Ly
Assuming your input files are sorted, you should be able to use the map-side join framework to do the job you describe (effectively an outer join) while avoiding going through the Reduce phase. There are instructions on how to use it here: http://hadoop.apache.org/common/docs/current/api/org/apach

Re: how to write this MapReduce

2009-10-26 Thread Thomas Thevis
Hi Anty, as far as I know, it is possible to retrieve the source file name of the input data within the Mapper's map() method. If so, you could use secondary sort on values (have a look at the Hadoop wiki pages) to propagate the values sorted first by key and second by filename to the Reducer

Re: how to write this MapReduce

2009-10-26 Thread Anty
Does MultipleInputs meet this situation? Does any one have any idea about this? On Mon, Oct 26, 2009 at 7:44 PM, Anty wrote: > Hi: > all > I have a such use case:i have three files,each file is key-value pairs, > file1: file2: file3: > key1-value1A

how to write this MapReduce

2009-10-26 Thread Anty
Hi: all I have a such use case:i have three files,each file is key-value pairs, file1: file2: file3: key1-value1A key1-value1B key1-value1C key2-value2A key2-value2B key2-value2C key3-value3A kye3-valu