Hi,

We had to solve some immediate problems of repeated and identical
queries being used several times within a narrow time frame.

Other times a user would run the same query but choose a different
format to see whether or not something "understandable" could be
gained through a different format. Furthermore if a page is repeatedly
refreshed it executes its inline queries which means in both cases the
query hasn't changed (the data might have been but we are using a 10
min cache time out) only the presentation layer (format) has.

What we needed was an approach that determines which query is
identical or different from another while an identical query should
use a cache object instead of hauling over the the same database
select again. We also needed an approach which uses "pure" MW
infrastructure without reliance on extra tables or classes to retrieve
or store cache objects.

We first tried to solve this issue by doing caching within the result
format itself but since this is a step to late (the database select
has already been occurred) we went a bit further looking for the part
in the SMW core which delivers results right before the PrinterResult
format is called.

## Proposal 1 (SMW Core)

### Hash key generation

In order for a query to be identified as identical we added a new
method [1.1] to class SMWQuery (SMW_Query.php) which does nothing more
than to generate a hash key based on available incoming parameters
(m_querystring, m_limit, and m_extraprintouts) etc.. The applied
method generates a hash key which allows us reliable to determine if a
query is identical or not. (Note: We don't need the format here to
determine the hash as the database query is identical for different
formats) but their might be a more elegant or time efficient method
available.

SMW_Query.php

[1.1] + public function getHash() { return md5( $this->m_querystring .
$this->m_limit . serialize( $this->m_extraprintouts ) ); }

Measurements (see below) of this method showed it consumes some time
to generate the hash because serialize ( m_extraprintouts ) is used
but for us it seemed the most reliable solution at hand to determine
if a query is identical or not (it also serves to stringify the array
to be able to generate a hash key).

### Fetching the cache object

The cache object is generated using the $wgMemc class and together
with a small change to the SMWQueryProcessor class
(SMWQueryProcessor.php) and SMW_SpecialAsk.php (it uses the same
getQueryResult( $queryobj ) ) we are able to use query caching for all
formats and all existing queries without further changes.

### SMWQueryProcessor.php (add the same to SMW_SpecialAsk.php)
380: -> static public function getResultFromQuery ( ... )

// $wgMemc global instance for object cache,
// $smwgQueryCache new global variable to switch on/off query caching
// $smwgQueryCacheTimeOut determine lifetime of cache set to 60 * 10 = 10 min
386: + global $smwgQueryCache, $smwgQueryCacheTimeOut, $wgMemc;

// Determine store settings
398: if ( array_key_exists( 'source', $params ) && array_key_exists(
$params['source'], $smwgQuerySources ) ) {
       $store = new $smwgQuerySources[$params['source']]();
+       $cachestore = $params['source'];
       $query->params = $params; // this is a hack
} else {
       $store = smwfGetStore(); // default store
+       $cachestore = 'store2';
}

// If activated check, and fetch cache object
// "$params['cache']" can be used to determine on a per query basis if
caching should be used or not
// $smwgQueryCache global variable to turn on/off query caching
+ if ( ( array_key_exists( 'cache', $params ) && ( $params['cache'] ==
false ) ) || $smwgQueryCache == false ) {
+        $res = $store->getQueryResult( $query );
+ } else {
+      $hash = wfMemcKey( 'smw-'. $cachestore, $query->getHash() );
+       $res   = unserialize( $wgMemc->get( $hash) );
+       if ( empty ( $res ) ) {
               $res = $store->getQueryResult( $query );
+               $serialized = serialize( $res );
+               $wgMemc->set( $hash, $serialized, $smwgQueryCacheTimeOut );
+      }
+}

* No cache option determined by $params['cache'] == false or
$smwgQueryCache == false
* The cache lifetime is defined by $smwgQueryCacheTimeOut = 60 * 10;

### Query performance
We were able to decrease our database load (related SMW queries) by at
least 30% relying on cache objects instead of database selects (tested
with Memcache and/or APC). For some of our high frequent pages
(containing several inline queries) the loading time has dropped from
8 sec. to 1 sec..  We also recognized a performance improvement for
queries containing "~* ...*" after repeated execution.

Some rough findings are:

Initial query where no hash object exists
--> Generate a hash key, find corresponding cache object, and if not
found execute the query (getQueryResult( $query );

SMWQuery::getHash hash ->3afbae5cf5658173f168faaf2c1b488f generate
hash ->0.0000930
SMWQueryProcessor::getResultFromQuery cache key
->aris:smw-store2:3afbae5cf5658173f168faaf2c1b488f check cache
->0.0000381
SMWQueryProcessor::getResultFromQuery cache key
->aris:smw-store2:3afbae5cf5658173f168faaf2c1b488f execute query
->0.0047388

Repeated query
--> Generate hash key, retrieve cache object
SMWQuery::getHash hash ->3afbae5cf5658173f168faaf2c1b488f generate
hash ->0.0001030
SMWQueryProcessor::getResultFromQuery cache key
->aris:smw-store2:3afbae5cf5658173f168faaf2c1b488f check cache
->0.0000491

## Proposal 2 (Hook)

We can imagine that proposal 1 might not full fill all requirements
for the SMW core therefore we would appreciate if a hook [2.1] is
added to SMW_SpecialAsk.php and SMWQueryProcessor.php before [2.2] so
we can apply our approach without having the change the SMW core every
time a new version is deployed .

[2.1]   // Enable third party to determine query results
       // $store -> which is either smwfGetStore() or defined by
$smwgQuerySources/$params['source']
       // $query -> to be able to determine query hash
       // &$res  -> referenced return object
       wfRunHooks( 'SMWQuery::fetchQueryResults', array( $store,
$query, &$res ) );

[2.2]   // Override $res otherwise run the default
        if ( !is_null( $res  ) ){
                $res = $store->getQueryResult( $query );
        }

Add a hash function [2.3] to SMW_Query.php to determine if or not a
query is identical (a better method to determine the content of a
query and make it comparable via a hash key is more than welcome, as
for now we are relying on [2.3] ).

[2.3] public function getHash() { return md5( $this->m_querystring .
$this->m_limit . serialize( $this->m_extraprintouts ) ); }

It would be nice if some one could verify our findings/changes and
point towards shortcomings that are inherent in our current approach.

PS: The above changes only apply to SMW. Semantic Drilldown uses its
own database selection schema (it doesn't use any SMWQueryProcessor
class) therefore above changes have limited reach for queries executed
through SD.

Cheers,

mwjames

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel

Reply via email to