On 18/11/2013 13:49, Timo-V Hatakka wrote:
On 15/11/2013 09:32, Timo-V Hatakka wrote:
Hi!
we made some SQL level investigations about time consumption of
synchronization task processes. We started with an empty
repository and processed the same external user table (of 5 000
users) twice. First run was processed more quickly and the time
used for SQL operations was divided for handling of
ACT_GE_BYTEARRAY, UAttrValue, ACT_HI_ACTINST, Policy, SyncopeUser,
UAttr and USchema tables. The next run was 150%-200% slower and DB
operations took about 50% of the time. We noticed that almost 75%
of DB operations was used in SELECT query to table ACT_RU_TASK.
The exact SQL clause was "select * from ACT_RU_TASK where
PARENT_TASK_ID_ = 'XXXXX'". There is no index for PARENT_TASK_ID_
column and in our runs the column value is always null. Is this
some kind of bug as it makes updates very slow?
Hi Timo,
thanks for your SQL analysis.
BTW, which DBMS are you using? If MySQL, with InnoDB as default
engine or not?
MySQL with InnoDB.
Yeah, I'd figured: if you have time, give a try to PostgreSQL; it is
*really* open source and works much better IMHO.
Anyway...
The table mentioned above involved in the critical queries
(ACT_RU_TASK) is actually an Activiti ([1] the default user
workflow engine) table, not under Syncope direct control: you can
try to manually define some indexes and check if there is any
significant improvement.
Alternatively, we can see if usage of API call(s) triggering such
query can be lowered from Syncope code - mainly the
ActivitiUserWorkflowAdapter [2], but I'm dubious.
I am aware that this can also be an activiti issue. But I think that
it is a little bit strange that there is a given PARENT_TASK_ID_ in
the query. Is this hierarchy maintained purely in activiti? No
possibility that you are calling some activiti operations with wrong
parameters? If you think so, I could define a new index and repeat
the test.
I don't think there is any explicit call in the code ([2], please
don't remove links when you reply or text might be left without
references, as happening above) where something related to
PARENT_TASK_ID is set.
I might be wrong, of course, so please take a look at the source code.
As said above, I still think that adding new indexes (or any other
simple tuning task) affecting Activiti tables is the first thing to
attempt.
I did it: "CREATE INDEX RU_TEST ON ACT_RU_TASK (PARENT_TASK_ID_)". It
helped!
The task was processed much faster and now ACT_RU_TASK queries took
about 13% of the time (75% earlier).
Thank you very much!
You're welcome: I think we need to share somehow your findings, so I
have opened SYNCOPE-441 [3] in order to create such index by default at
every deployment.
Thanks for reporting.
Regards.
[1] http://www.activiti.org
[2]
http://svn.apache.org/repos/asf/syncope/branches/1_1_X/core/src/main/java/org/apache/syncope/core/workflow/user/activiti/ActivitiUserWorkflowAdapter.java
[3] https://issues.apache.org/jira/browse/SYNCOPE-441
--
Francesco Chicchiriccò
Tirasa - Open Source Excellence
http://www.tirasa.net/
ASF Member, Apache Syncope PMC chair, Apache Cocoon PMC Member
http://people.apache.org/~ilgrosso/