[
https://issues.apache.org/jira/browse/STANBOL-669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404013#comment-13404013
]
Rupert Westenthaler commented on STANBOL-669:
---------------------------------------------
Found the reason causing this issue:
It has nothing to do with the EventJobManager (will change the title of this
Issue). The cause where two LRU caches implemented using LinkedHashMap used by
the Apache Entityhub SolrYard to store mappings from RDF properties <-> Solr
Field names.
This LRU caches where not synchronized with the assumptions that only put and
read requests are used. Even removes (if the cache gets to big) would not be a
problem, because there is no possibility of dirty reads (because even the
remove entry would still be correct) and in case of reads the same calculation
would be made twice.
However RTFM of LinkedHashset [1] would have saved a lot of time as it states -
even in bold letters - "In access-ordered linked hash maps, merely querying the
map with get is a structural modification."
Based on debugging that happened: Two threads access the LinkedHashset at
nearly the same time. Both try to change the order based on access time. They
do not block each other, but instead the start to consume 100% processing power
without coming to an agreement.
This caused than the method to never return what looked first as if some
EnhancementsJob never complete with the EventJobManager (hence the original
title of this Issue).
As a Bonus of the Stanbol Enhancer will have an Integration Test that can
simulate concurrent Enhancement Requests on an Enhancement Requests. Stanbol
users will also be able to run this test against their Stanbol Servers by using
mvn -o test -Dtest.server.url={stanbol-server} -Dtest=MultiThreadedTest
currently this test runs using english long abstracts of dbpedia (10k of those
will be included in the integration test) with 20 concurrent threads over 1000
documents by using the default chain. I plan to extend this test so that it can
be configured by additional system properties.
[1]
http://download.java.net/jdk7/archive/b123/docs/api/java/util/LinkedHashMap.html
> EventJobManager does not corretly use Read/Write Locks on EnhancementJobs
> -------------------------------------------------------------------------
>
> Key: STANBOL-669
> URL: https://issues.apache.org/jira/browse/STANBOL-669
> Project: Stanbol
> Issue Type: Bug
> Components: Enhancer
> Reporter: Rupert Westenthaler
> Assignee: Rupert Westenthaler
>
> When bombarding the enhancer with multiple concurrent EnhancementJobs the
> EvenJobManager might not correctly process all requests due to changes that
> do not correctly apply a writeLock on the EnhancementJob.
> As fixing those things is not an easy thing I implemented already an new
> Integration-Test that allows to send long abstracts from dbpedia as content
> to the enhancer. The integration-test includes enough data for 10.000
> requests. It uses "java.util.concurrent.ExecutorService" for async Requests
> and the "PoolingClientConnectionManager" of apache http commons for sending
> multiple parallel requests.
> Setting this to 1000 requests with 10 threads lets easily to reproduce the
> problem.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira