Re: Does FileSplit respect the record boundary?

2012-02-10 Thread Harsh J
Hi, Please read the map section of http://wiki.apache.org/hadoop/HadoopMapReduce to understand how Hadoop ends up respecting record boundaries despite block-chops not taking that into consideration. I hope it helps clear things up for you. On Fri, Feb 10, 2012 at 10:26 PM, GUOJUN Zhu

Where Is DataJoinMapperBase?

2012-02-10 Thread Bing Li
Hi, all, I am starting to learn advanced Map/Reduce. However, I cannot find the class DataJoinMapperBase in my downloaded Hadoop 1.0.0 and 0.20.2. So I searched on the Web and get the following link. http://www.java2s.com/Code/Jar/h/Downloadhadoop0201datajoinjar.htm From the link I got the

Re: Does FileSplit respect the record boundary?

2012-02-10 Thread GUOJUN Zhu
Thank you for the reply. That page helps a lot. I still have a more specific question. In a LineRecordReader's constructor (hadoop 1.0.0) public LineRecordReader(Configuration job, FileSplit split). Does a call final Path file = split.getPath() return the logical file in HDFS or just the

Re: Where Is DataJoinMapperBase?

2012-02-10 Thread Harsh J
Bing, You should find the data_join contrib module under ./src/contrib/data_join, and you should be able to build from that if its jar does not also pre-exist under ./contrib/data_join. On Sat, Feb 11, 2012 at 1:09 AM, Bing Li lbl...@gmail.com wrote: Hi, all, I am starting to learn advanced

Re: job taking input file, which is being written by its preceding job's map phase

2012-02-10 Thread Vamshi Krishna
Hi harsh, i am trying to find what are all the rowkeys present in two tables. If userid is the rowKey for two different tables, i want to find all those rowsKeys present in both thae tables. Fo that i need to read from two tables into a mapreduce job. i.e i want to take multiple tables as input to

RE: Does FileSplit respect the record boundary?

2012-02-10 Thread Vinayakumar B
Hi Zhu, Ø The LineRecordReader will get the path in the HDFS itself, not on the LocalFileSystem, But it’s the NameNode who gives the list of DataNodes for a particular block, sorted by the Distance from the Client. i.e. Here Machine where Task is Running. Ø For the line which ends