I would say yes make this a Jira. The actual change can fall (as proposed by Jay) in two directions: Put in synchronization in all implementations OR take it out of all implementations.
I think the first thing to determine is why the synchronization was put into the LineRecordWriter in the first place. https://github.com/apache/hadoop-common/blame/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/TextOutputFormat.java The oldest I have been able to find is a commit on 2009-05-18 for HADOOP-4687 that is about moving stuff around (i.e. this code is even older than that). Niels On Thu, Aug 8, 2013 at 2:21 PM, Sathwik B P <sath...@apache.org> wrote: > Hi Harsh, > > Do you want me to raise a Jira on this. > > regards, > sathwik > > > On Thu, Aug 8, 2013 at 5:23 PM, Jay Vyas <jayunit...@gmail.com> wrote: > >> Then is this a bug? Synchronization in absence of any race condition is >> normally considered "bad". >> >> In any case id like to know why this writer is synchronized whereas the >> other one are not.. That is, I think, then point at issue: either other >> writers should be synchronized or else this one shouldn't be - consistency >> across the write implementations is probably desirable so that changes to >> output formats or record writers don't lead to bugs in multithreaded >> environments . >> >> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote: >> >> While we don't fork by default, we do provide a MultithreadedMapper >> implementation that would require such synchronization. But if you are >> asking is it necessary, then perhaps the answer is no. >> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <azury...@gmail.com> wrote: >> >>> its not hadoop forked threads, we may create a line record writer, then >>> call this writer concurrently. >>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sathwik...@gmail.com> wrote: >>> >>>> Hi, >>>> Thanks for your reply. >>>> May I know where does hadoop fork multiple threads to use a single >>>> RecordWriter. >>>> >>>> regards, >>>> sathwik >>>> >>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <azury...@gmail.com> wrote: >>>> >>>>> because we may use multi-threads to write a single file. >>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sath...@apache.org> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other >>>>>> RecordWriter implementations define the write as synchronized. >>>>>> Any specific reason for this. >>>>>> >>>>>> regards, >>>>>> sathwik >>>>>> >>>>> >>>> > -- Best regards / Met vriendelijke groeten, Niels Basjes