Hi, We had to solve some immediate problems of repeated and identical queries being used several times within a narrow time frame.
Other times a user would run the same query but choose a different format to see whether or not something "understandable" could be gained through a different format. Furthermore if a page is repeatedly refreshed it executes its inline queries which means in both cases the query hasn't changed (the data might have been but we are using a 10 min cache time out) only the presentation layer (format) has. What we needed was an approach that determines which query is identical or different from another while an identical query should use a cache object instead of hauling over the the same database select again. We also needed an approach which uses "pure" MW infrastructure without reliance on extra tables or classes to retrieve or store cache objects. We first tried to solve this issue by doing caching within the result format itself but since this is a step to late (the database select has already been occurred) we went a bit further looking for the part in the SMW core which delivers results right before the PrinterResult format is called. ## Proposal 1 (SMW Core) ### Hash key generation In order for a query to be identified as identical we added a new method [1.1] to class SMWQuery (SMW_Query.php) which does nothing more than to generate a hash key based on available incoming parameters (m_querystring, m_limit, and m_extraprintouts) etc.. The applied method generates a hash key which allows us reliable to determine if a query is identical or not. (Note: We don't need the format here to determine the hash as the database query is identical for different formats) but their might be a more elegant or time efficient method available. SMW_Query.php [1.1] + public function getHash() { return md5( $this->m_querystring . $this->m_limit . serialize( $this->m_extraprintouts ) ); } Measurements (see below) of this method showed it consumes some time to generate the hash because serialize ( m_extraprintouts ) is used but for us it seemed the most reliable solution at hand to determine if a query is identical or not (it also serves to stringify the array to be able to generate a hash key). ### Fetching the cache object The cache object is generated using the $wgMemc class and together with a small change to the SMWQueryProcessor class (SMWQueryProcessor.php) and SMW_SpecialAsk.php (it uses the same getQueryResult( $queryobj ) ) we are able to use query caching for all formats and all existing queries without further changes. ### SMWQueryProcessor.php (add the same to SMW_SpecialAsk.php) 380: -> static public function getResultFromQuery ( ... ) // $wgMemc global instance for object cache, // $smwgQueryCache new global variable to switch on/off query caching // $smwgQueryCacheTimeOut determine lifetime of cache set to 60 * 10 = 10 min 386: + global $smwgQueryCache, $smwgQueryCacheTimeOut, $wgMemc; // Determine store settings 398: if ( array_key_exists( 'source', $params ) && array_key_exists( $params['source'], $smwgQuerySources ) ) { $store = new $smwgQuerySources[$params['source']](); + $cachestore = $params['source']; $query->params = $params; // this is a hack } else { $store = smwfGetStore(); // default store + $cachestore = 'store2'; } // If activated check, and fetch cache object // "$params['cache']" can be used to determine on a per query basis if caching should be used or not // $smwgQueryCache global variable to turn on/off query caching + if ( ( array_key_exists( 'cache', $params ) && ( $params['cache'] == false ) ) || $smwgQueryCache == false ) { + $res = $store->getQueryResult( $query ); + } else { + $hash = wfMemcKey( 'smw-'. $cachestore, $query->getHash() ); + $res = unserialize( $wgMemc->get( $hash) ); + if ( empty ( $res ) ) { $res = $store->getQueryResult( $query ); + $serialized = serialize( $res ); + $wgMemc->set( $hash, $serialized, $smwgQueryCacheTimeOut ); + } +} * No cache option determined by $params['cache'] == false or $smwgQueryCache == false * The cache lifetime is defined by $smwgQueryCacheTimeOut = 60 * 10; ### Query performance We were able to decrease our database load (related SMW queries) by at least 30% relying on cache objects instead of database selects (tested with Memcache and/or APC). For some of our high frequent pages (containing several inline queries) the loading time has dropped from 8 sec. to 1 sec.. We also recognized a performance improvement for queries containing "~* ...*" after repeated execution. Some rough findings are: Initial query where no hash object exists --> Generate a hash key, find corresponding cache object, and if not found execute the query (getQueryResult( $query ); SMWQuery::getHash hash ->3afbae5cf5658173f168faaf2c1b488f generate hash ->0.0000930 SMWQueryProcessor::getResultFromQuery cache key ->aris:smw-store2:3afbae5cf5658173f168faaf2c1b488f check cache ->0.0000381 SMWQueryProcessor::getResultFromQuery cache key ->aris:smw-store2:3afbae5cf5658173f168faaf2c1b488f execute query ->0.0047388 Repeated query --> Generate hash key, retrieve cache object SMWQuery::getHash hash ->3afbae5cf5658173f168faaf2c1b488f generate hash ->0.0001030 SMWQueryProcessor::getResultFromQuery cache key ->aris:smw-store2:3afbae5cf5658173f168faaf2c1b488f check cache ->0.0000491 ## Proposal 2 (Hook) We can imagine that proposal 1 might not full fill all requirements for the SMW core therefore we would appreciate if a hook [2.1] is added to SMW_SpecialAsk.php and SMWQueryProcessor.php before [2.2] so we can apply our approach without having the change the SMW core every time a new version is deployed . [2.1] // Enable third party to determine query results // $store -> which is either smwfGetStore() or defined by $smwgQuerySources/$params['source'] // $query -> to be able to determine query hash // &$res -> referenced return object wfRunHooks( 'SMWQuery::fetchQueryResults', array( $store, $query, &$res ) ); [2.2] // Override $res otherwise run the default if ( !is_null( $res ) ){ $res = $store->getQueryResult( $query ); } Add a hash function [2.3] to SMW_Query.php to determine if or not a query is identical (a better method to determine the content of a query and make it comparable via a hash key is more than welcome, as for now we are relying on [2.3] ). [2.3] public function getHash() { return md5( $this->m_querystring . $this->m_limit . serialize( $this->m_extraprintouts ) ); } It would be nice if some one could verify our findings/changes and point towards shortcomings that are inherent in our current approach. PS: The above changes only apply to SMW. Semantic Drilldown uses its own database selection schema (it doesn't use any SMWQueryProcessor class) therefore above changes have limited reach for queries executed through SD. Cheers, mwjames ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel