Can you share your entire log file. Why your operator got killed for first time? The above mentioned error seems to be at recovery time.
-Priyanka On Tue, Jun 6, 2017 at 12:44 AM, Guilherme Hott <[email protected]> wrote: > Hi, I have this error and I don't know why it's happening. The operator > who is failing is processing a tuple, doing a dedup check, saving into > HBase if it's new or update and emiting to the stream. But, because of > this, only a few tuples are processed due to the failure. > > 2017-06-04 06:43:45,265 > [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl >>> #6] INFO impl.ContainerManagementProtocolProxy newProxy - Opening >>> proxy : localhost:8052 >> >> 2017-06-04 06:43:47,573 [IPC Server handler 0 on 38848] INFO >> stram.StreamingContainerParent >>> log - child msg: [container_1496564390452_0002_01_000010] Entering >>> heartbeat loop.. context: PTContainer[id=1(container_ >>> 1496564390452_0002_01_000010),state=ALLOCATED,operators=[ >>> PTOperator[id=6,name=ConsoleNew], PTOperator[id=7,name=ConsoleBad], >>> PTOperator[id=1,name=cloutApiBanksInput], >>> PTOperator[id=5,name=banksDeduplicator], >>> PTOperator[id=10,name=ConsoleNewJDBC], >>> PTOperator[id=11,name=ConsoleErrorJDBC], >>> PTOperator[id=9,name=cloutApiBanksOutput], >>> PTOperator[id=8,name=banksOpLoadObject], >>> PTOperator[id=3,name=Deduper], PTOperator[id=4,name= >>> cloutApiBanksInput.outputPort#unifier], PTOperator[id=2,name= >>> cloutApiBanksInput]]] >> >> 2017-06-04 06:43:48,587 [IPC Server handler 1 on 38848] INFO >> stram.StreamingContainerManager >>> processHeartbeat - Container container_1496564390452_0002_01_000010 >>> buffer server: datatorrent-sandbox:35304 >> >> 2017-06-04 06:43:56,262 [IPC Server handler 16 on 38848] INFO >> stram.StreamingContainerParent >>> log - child msg: Stopped running due to an exception. >>> java.lang.NullPointerException >> >> at com.google.common.base.Preconditions.checkNotNull( >>> Preconditions.java:187) >> >> at org.apache.apex.malhar.lib.wal.FSWindowDataManager. >>> retrieve(FSWindowDataManager.java:487) >> >> at org.apache.apex.malhar.lib.wal.FSWindowDataManager. >>> retrieve(FSWindowDataManager.java:448) >> >> at com.datatorrent.lib.db.jdbc.AbstractJdbcPollInputOperator.replay( >>> AbstractJdbcPollInputOperator.java:316) >> >> at com.datatorrent.lib.db.jdbc.AbstractJdbcPollInputOperator.beginWindow( >>> AbstractJdbcPollInputOperator.java:203) >> >> at com.datatorrent.stram.engine.InputNode.run(InputNode.java:122) >> >> at com.datatorrent.stram.engine.StreamingContainer$2.run( >>> StreamingContainer.java:1441) >> >> context: PTContainer[id=1(container_1496564390452_0002_01_000010), >>> state=ACTIVE,operators=[PTOperator[id=6,name=ConsoleNew], >>> PTOperator[id=7,name=ConsoleBad], PTOperator[id=1,name=cloutApiBanksInput], >>> PTOperator[id=5,name=banksDeduplicator], >>> PTOperator[id=10,name=ConsoleNewJDBC], >>> PTOperator[id=11,name=ConsoleErrorJDBC], >>> PTOperator[id=9,name=cloutApiBanksOutput], >>> PTOperator[id=8,name=banksOpLoadObject], PTOperator[id=3,name=Deduper], >>> PTOperator[id=4,name=cloutApiBanksInput.outputPort#unifier], >>> PTOperator[id=2,name=cloutApiBanksInput]]] >> >> 2017-06-04 06:43:56,683 [IPC Server handler 5 on 38848] WARN >> stram.StreamingContainerManager >>> processOperatorFailure - Operator failure: >>> PTOperator[id=2,name=cloutApiBanksInput] >>> count: 6 >> >> 2017-06-04 06:43:56,683 [IPC Server handler 5 on 38848] ERROR >> stram.StreamingContainerManager >>> processOperatorFailure - Initiating container restart after operator >>> failure PTOperator[id=2,name=cloutApiBanksInput] >> >> 2017-06-04 06:43:57,292 [main] INFO stram.StreamingAppMasterService >>> sendContainerAskToRM - Requested stop container >>> container_1496564390452_0002_01_000010 >> >> 2017-06-04 06:43:57,292 >> [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl >>> #7] INFO impl.NMClientAsyncImpl run - Processing Event EventType: >>> STOP_CONTAINER for Container container_1496564390452_0002_01_000010 >> >> 2017-06-04 06:43:57,294 >> [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl >>> #7] INFO impl.ContainerManagementProtocolProxy newProxy - Opening >>> proxy : localhost:8052 >> >> 2017-06-04 06:43:59,301 [main] INFO stram.StreamingAppMasterService >>> execute - Completed containerId=container_1496564390452_0002_01_000010, >>> state=COMPLETE, exitStatus=-105, diagnostics=Container killed by the >>> ApplicationMaster. >> >> Container killed on request. Exit code is 143 >> >> Container exited with a non-zero exit code 143 >> >> 2017-06-04 06:43:59,301 [main] INFO stram.StreamingContainerManager >>> scheduleContainerRestart - Initiating recovery for >>> container_1496564390452_0002_01_000010@localhost:8052 >> >> 2017-06-04 06:43:59,302 [main] INFO stram.StreamingContainerManager >>> scheduleContainerRestart - Affected operators >>> [PTOperator[id=6,name=ConsoleNew], >>> PTOperator[id=7,name=ConsoleBad], PTOperator[id=1,name=cloutApiBanksInput], >>> PTOperator[id=4,name=cloutApiBanksInput.outputPort#unifier], >>> PTOperator[id=3,name=Deduper], PTOperator[id=5,name=banksDeduplicator], >>> PTOperator[id=8,name=banksOpLoadObject], >>> PTOperator[id=9,name=cloutApiBanksOutput], >>> PTOperator[id=11,name=ConsoleErrorJDBC], >>> PTOperator[id=10,name=ConsoleNewJDBC], >>> PTOperator[id=2,name=cloutApiBanksInput]] >> >> 2017-06-04 06:44:00,334 [main] INFO stram.ResourceRequestHandler getHost >>> - Strict anti-affinity = [] for container with operators >>> PTOperator[id=6,name=ConsoleNew],PTOperator[id=7, >>> name=ConsoleBad],PTOperator[id=1,name=cloutApiBanksInput], >>> PTOperator[id=5,name=banksDeduplicator],PTOperator[ >>> id=10,name=ConsoleNewJDBC],PTOperator[id=11,name= >>> ConsoleErrorJDBC],PTOperator[id=9,name=cloutApiBanksOutput] >>> ,PTOperator[id=8,name=banksOpLoadObject],PTOperator[ >>> id=3,name=Deduper],PTOperator[id=4,name=cloutApiBanksInput. >>> outputPort#unifier],PTOperator[id=2,name=cloutApiBanksInput] >> >> 2017-06-04 06:44:00,334 [main] INFO stram.ResourceRequestHandler getHost >>> - Found host null >> >> 2017-06-04 06:44:01,341 [main] INFO stram.StreamingAppMasterService >>> execute - Got new container., >>> containerId=container_1496564390452_0002_01_000011, >>> containerNode=localhost:8052, containerNodeURI=localhost:8042, >>> containerResourceMemory6144, priority9 >> >> 2017-06-04 06:44:01,341 [main] INFO stram.StreamingContainerManager >>> assignContainer - Removing container agent container_1496564390452_0002_ >>> 01_000010 >> >> 2017-06-04 06:44:01,342 [main] INFO stram.LaunchContainerRunnable run - >>> Setting up container launch context for containerid=container_ >>> 1496564390452_0002_01_000011 >> >> > -- > *Guilherme Hott* > *Software Engineer* > Skype: guilhermehott > @guilhermehott > https://www.linkedin.com/in/guilhermehott > >
