[ https://issues.apache.org/jira/browse/AIRAVATA-2736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dimuthu Upeksha resolved AIRAVATA-2736. --------------------------------------- Resolution: Fixed > Job submitted and running in HPC while the experiment is tagged as FAILED > ------------------------------------------------------------------------- > > Key: AIRAVATA-2736 > URL: https://issues.apache.org/jira/browse/AIRAVATA-2736 > Project: Airavata > Issue Type: Bug > Components: helix implementation > Affects Versions: 0.18 > Environment: http://149.165.168.248:8008/ - Helix test env > Reporter: Eroma > Assignee: Dimuthu Upeksha > Priority: Major > Fix For: 0.18 > > > # Submitted an experiment which then submitted the job. > # Job ID is returned and the status is ACTIVE. > # Due to zookeeper connection issue the experiment is FAILED. > # The job is still running in HPC > # Airavata is not waiting for job monitoring as the task status is not > updated in the zookeeper. > # error in log [1] > # SLM001-AmberSander-BR2_5ed5a19f-ab44-4eba-afb7-1feafaf0bbdd - exp ID > [1] > |org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for /monitoring/2159926/lock at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at > org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:778) at > org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:696) > at > org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:679) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) at > org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:676) > at > org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453) > at > org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443) > at > org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44) > at > org.apache.airavata.helix.impl.task.submission.JobSubmissionTask.createMonitoringNode(JobSubmissionTask.java:83) > at > org.apache.airavata.helix.impl.task.submission.DefaultJobSubmissionTask.onRun(DefaultJobSubmissionTask.java:144) > at > org.apache.airavata.helix.impl.task.AiravataTask.onRun(AiravataTask.java:264) > at org.apache.airavata.helix.core.AbstractTask.run(AbstractTask.java:74) at > org.apache.helix.task.TaskRunner.run(TaskRunner.java:70) at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at > java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748)| -- This message was sent by Atlassian JIRA (v7.6.3#76005)