Hi!
we made some SQL level investigations about time consumption of
synchronization task processes. We started with an empty repository
and processed the same external user table (of 5 000 users) twice.
First run was processed more quickly and the time used for SQL
operations was divided for handling of ACT_GE_BYTEARRAY,
UAttrValue, ACT_HI_ACTINST, Policy, SyncopeUser, UAttr and USchema
tables. The next run was 150%-200% slower and DB operations took
about 50% of the time. We noticed that almost 75% of DB operations
was used in SELECT query to table ACT_RU_TASK. The exact SQL clause
was "select * from ACT_RU_TASK where PARENT_TASK_ID_ = 'XXXXX'".
There is no index for PARENT_TASK_ID_ column and in our runs the
column value is always null. Is this some kind of bug as it makes
updates very slow?
Hi Timo,
thanks for your SQL analysis.
BTW, which DBMS are you using? If MySQL, with InnoDB as default
engine or not?
MySQL with InnoDB.
The table mentioned above involved in the critical queries
(ACT_RU_TASK) is actually an Activiti ([1] the default user workflow
engine) table, not under Syncope direct control: you can try to
manually define some indexes and check if there is any significant
improvement.
Alternatively, we can see if usage of API call(s) triggering such
query can be lowered from Syncope code - mainly the
ActivitiUserWorkflowAdapter [2], but I'm dubious.
I am aware that this can also be an activiti issue. But I think that
it is a little bit strange that there is a given PARENT_TASK_ID_ in
the query. Is this hierarchy maintained purely in activiti? No
possibility that you are calling some activiti operations with wrong
parameters? If you think so, I could define a new index and repeat the
test.
Regards,
Timo