I suppose I should have been clearer. There's no problem out of box if
people stick to the libraries we offer :)

Yes the LRW was marked synchronized at some point over 8 years ago [1]
in support for multi-threaded maps, but the framework has changed much
since then. The MultithreadedMapper/etc. API we offer now
automatically shields the devs away from having to think of output
thread safety [2].

I can imagine there can only be a problem if a user writes their own
unsafe multi threaded task. I suppose we could document that in the
Mapper/MapRunner and Reducer APIs.

[1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
Commit added a synchronized to the write call.
[2] - MultiThreadedMapper/etc. synchronize over the collector -
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup

On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <azury...@gmail.com> wrote:
> sequence writer is also synchronized, I dont think this is bad.
>
> if you call HDFS api to write concurrently, then its necessary.
>
> On Aug 8, 2013 7:53 PM, "Jay Vyas" <jayunit...@gmail.com> wrote:
>>
>> Then is this a bug?  Synchronization in absence of any race condition is
>> normally considered "bad".
>>
>> In any case id like to know why this writer is synchronized whereas the
>> other one are not.. That is, I think, then point at issue: either other
>> writers should be synchronized or else this one shouldn't be - consistency
>> across the write implementations is probably desirable so that changes to
>> output formats or record writers don't lead to bugs in multithreaded
>> environments .
>>
>> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> While we don't fork by default, we do provide a MultithreadedMapper
>> implementation that would require such synchronization. But if you are
>> asking is it necessary, then perhaps the answer is no.
>>
>> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <azury...@gmail.com> wrote:
>>>
>>> its not hadoop forked threads, we may create a line record writer, then
>>> call this writer concurrently.
>>>
>>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sathwik...@gmail.com> wrote:
>>>>
>>>> Hi,
>>>> Thanks for your reply.
>>>> May I know where does hadoop fork multiple threads to use a single
>>>> RecordWriter.
>>>>
>>>> regards,
>>>> sathwik
>>>>
>>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <azury...@gmail.com> wrote:
>>>>>
>>>>> because we may use multi-threads to write a single file.
>>>>>
>>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sath...@apache.org> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>>> RecordWriter implementations define the write as synchronized.
>>>>>> Any specific reason for this.
>>>>>>
>>>>>> regards,
>>>>>> sathwik
>>>>
>>>>
>



-- 
Harsh J

Reply via email to