richardlawley opened a new issue #3829: Unable to schedule async job - dangling 
vm_work_job rows
URL: https://github.com/apache/cloudstack/issues/3829
 
 
   <!--
   Verify first that your issue/request is not already reported on GitHub.
   Also test if the latest release and master branch are affected too.
   Always add information AFTER of these HTML comments, but no need to delete 
the comments.
   -->
   
   ##### ISSUE TYPE
   <!-- Pick one below and delete the rest -->
    * Bug Report
   
   ##### COMPONENT NAME
   <!--
   Categorize the issue, e.g. API, VR, VPN, UI, etc.
   -->
   ~~~
   Management Server
   ~~~
   
   ##### CLOUDSTACK VERSION
   <!--
   New line separated list of affected versions, commit ID for issues on master 
branch.
   -->
   
   ~~~
   4.11.2, appears unfixed in master
   ~~~
   
   ##### CONFIGURATION
   <!--
   Information about the configuration if relevant, e.g. basic network, 
advanced networking, etc.  N/A otherwise
   -->
   N/A
   
   ##### OS / ENVIRONMENT
   <!--
   Information about the environment if relevant, N/A otherwise
   -->
   N/A
   
   ##### SUMMARY
   <!-- Explain the problem/feature briefly -->
   AsyncJobManager contains code to expunge async job records after a 
configurable time (24h default).  Some async jobs have corresponding 
vm_work_job entries, which are created using the generated key of async_job.  
The expunge code only removes the async_job record, leaving dangling 
vm_work_job rows.
   
   Because the supported version of mysql recalculates autonumber sequences 
based on the last current row, this can cause async jobs to be created with IDs 
referencing existing rows in vm_work_job, which causes a primary key violation 
if this clashes, showing the following log message:
   
   ```
   Unable to schedule async job for command com.cloud.vm.VmWorkStart, 
unexpected exception. 
   javax.persistence.EntityExistsException: Entity already exists
   ```
   
   The set of circumstances to trigger this is a bit obscure, which is probably 
why this has been mentioned on the mailing list a few times but never with any 
fix or explanation.  I believe the only fix required is to ensure that when 
jobs are purged from async_jobs, they are also purged from vm_work_job.
   
   For anyone else encountering this bug (hopefully the error text will be 
enough!), the workaround is to remove all vm_work_job rows above the current 
max async_job, i.e. 
   
   ```
   DELETE FROM vm_work_job WHERE id > (SELECT MAX(id) FROM async_job)
   ```
   
   ##### STEPS TO REPRODUCE
   <!--
   For bugs, show exactly how to reproduce the problem, using a minimal 
test-case. Use Screenshots if accurate.
   
   For new features, show how the feature would be used.
   -->
   
   * On a clean management server, run some VM jobs to ensure that vm_work_job 
rows are present.
   * Wait for `job.expire.minutes` so that mgmt server can clean out the 
async_jobs table
   * Observe that vm_work_job rows are still present
   * Restart mysql server (perhaps mgmt server too, unsure)
   * Try to start/stop VMs until the async_job id clashes with a vm_work_job 
row.
   
   <!-- You can also paste gist.github.com links for larger files -->
   
   ##### EXPECTED RESULTS
   <!-- What did you expect to happen when running the steps above? -->
   
   VM should start/stop correctly
   
   ##### ACTUAL RESULTS
   <!-- What actually happened? -->
   
   Error Starting/Stopping VM, with log message as below:
   
   ```
   2020-01-22 09:40:28,492 WARN  [o.a.c.f.j.i.AsyncJobManagerImpl] 
(API-Job-Executor-13:ctx-e8081448 job-4920 ctx-ad7b892d) (logid:08136108) 
Unable to schedule async job for command com.cloud.vm.VmWorkStop, unexpected 
exception.
   javax.persistence.EntityExistsException: Entity already exists:
           at 
com.cloud.utils.db.GenericDaoBase.persist(GenericDaoBase.java:1434)
           at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$1.doInTransaction(AsyncJobManagerImpl.java:235)
           at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$1.doInTransaction(AsyncJobManagerImpl.java:231)
           at 
com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:50)
           at com.cloud.utils.db.Transaction.execute(Transaction.java:40)
           at com.cloud.utils.db.Transaction.execute(Transaction.java:47)
           at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl.submitAsyncJob(AsyncJobManagerImpl.java:231)
           at 
com.cloud.vm.VirtualMachineManagerImpl.stopVmThroughJobQueue(VirtualMachineManagerImpl.java:4505)
           at 
com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1600)
           at 
com.cloud.vm.VirtualMachineManagerImpl.destroy(VirtualMachineManagerImpl.java:1882)
           at 
org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.destroyVirtualMachine(VMEntityManagerImpl.java:271)
           at 
org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.destroy(VirtualMachineEntityImpl.java:233)
           at 
com.cloud.vm.UserVmManagerImpl.destroyVm(UserVmManagerImpl.java:4534)
           at 
com.cloud.vm.UserVmManagerImpl.destroyVm(UserVmManagerImpl.java:2758)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:338)
           at 
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:197)
           at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
           at 
org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:107)
           at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:174)
           at 
com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:51)
           at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:174)
           at 
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
           at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185)
           at 
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
           at com.sun.proxy.$Proxy164.destroyVm(Unknown Source)
           at 
org.apache.cloudstack.api.command.admin.vm.DestroyVMCmdByAdmin.execute(DestroyVMCmdByAdmin.java:46)
           at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:150)
           at 
com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
           at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:581)
           at 
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
           at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
           at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
           at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
           at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
           at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:529)
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: 
com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: 
Duplicate entry '4921' for key 'PRIMARY'
           at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
           at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
           at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
           ... 59 more
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to