> Hi Folks , > > I have been writing a map-reduce application where I am having an input > file containing records and every field in the record is separated by some > delimiter. > > In addition to this user will also provide a list of columns that he wants > to lookup in a master properties file (stored in HDFS). If this columns > (lets say it a key) is present in master properties file then get the > corresponding value and update the key with this value and if the key is > not present it in the master properties file then it will create a new > value for this key and will write to this property file and will also > update in the record. > > I have written this application , tested it and everything worked fine > till now. > > *e.g :* *I/P Record :* This | is | the | test | record > > *Columns :* 2,4 (that means code will look up only field *"is" and "test"* in > the master properties file.) > > Here , I have a question. > > *Q 1:* In the case when my input file is huge and it is splitted across > the multiple mappers , I was getting the below mentioned exception where > all the other mappers tasks were failing. *Also initially when I started > the job my master properties file is empty.* In my code I have a check if > this file (master properties) doesn't exist create a new empty file before > submitting the job itself. > > e.g : If i have 4 splits of data , then 3 map tasks are failing. But after > this all the failed map tasks restarts and finally the job become > successful. > > So , *here is the question , is it possible to make sure that when one of > the mapper tasks is writing to a file , other should wait until the first > one is finished. ?* I read that all the mappers task don't interact with > each other. > > Also what will happen in the scenario when I start multiple parallel > map-reduce jobs and all of them working on the same properties files. *Is > there any way to have synchronization between two independent map reduce > jobs*? > > I have also read that ZooKeeper can be used in such scenarios , Is that > correct ? > > > Error: > com.techidiocy.hadoop.filesystem.api.exceptions.HDFSFileSystemException: > IOException - failed while appending data to the file ->Failed to create file > [/user/cloudera/lob/master/bank.properties] for > [DFSClient_attempt_1407778869492_0032_m_000002_0_1618418105_1] on client > [10.X.X.17], because this file is already being created by > [DFSClient_attempt_1407778869492_0032_m_000005_0_-949968337_1] on [10.X.X.17] > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2548) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2377) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2612) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2575) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:522) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:373) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > at > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) > at > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) > >