On Freitag, 14. Dezember 2007, Sergey Chernyshev wrote: > Got it - if it'll speed up the process, that'll be great. Currently SMW on > top of MW runs significantly slower then just MW which is not very good > because it means that SMW+MW can't scale as good as MW alone. > > Can you describe in a couple of paragraphs how SMW data and queries are > getting cached and how that cache is being invalidated, what works on the > fly and what is served from parser cache. > > I understand it's a lot to describe, but projects with massive amount of > data and traffic, performance can be a big show-stopper - we picked MW for > one of our projects because of Wikipedia performance example and > predictability and I hope that it's not too distant for SMW to inherit > these qualities, but I'd like to understand the overall picture.
Yes, agreed. Of course we have always designed basic algorithms with regards to performance and scalability, and especially tried to pick features based on this aspect. On the other hand, caching is significantly under-developed in SMW as it is, since it mainly uses the existing MW caches where applicable. There are various types of operations that are relevant to performance, and each can probably be optimised/cached independently: (1) Basic page display -- by far the most common operation. (2) Query answering, inline and on Special:Ask (3) Annotation parsing and page formatting. (4) Maintenance specials such as Special:Properties. (5) OWL/RDF export. (6) Browsing special Special:Browse I will sketch performance issues for each of those. For actual numbers, see http://ontoworld.org/profileinfo.php to find out how severe each operation is on ontoworld.org. (1) is clearly the main operation, and for existing pages SMW merely uses MW's parser/page caches. No mechanism for cache invalidation exists, but MW regularly updates page caches. This allows outdated inline queries but gives us good hope for basic scalability in large environments. Especially SMW does not hook into any operations that happen when reproducing parser cached pages. Even the Factbox comes from the parser cache (which is why we cannot readily translate it to the user's language as MW does for categories). (2) Query answering is done without any caching, and this is clearly a problem. While inline queries are computed only once and stored in the parser cache afterwards, Special:Ask has no caching facility at all. This needs to change in the future. Targetted cache invalidation might still be difficult and it is not clear whether the effort is needed (one could enable manual cache clearing like for pages). A new query cache -- design, architecture and implementation -- is needed here. (3) Page formatting uses very few additional DB calls, and mainly works on the wiki source code that was already retrieved anyway. It has no major performance impact (see smwfParserHook in the profile). (4) Maintenance special can be slow, but have been designed to allow the caching mechanism that MW uses for its maintenance specials. This is not implemented, but it would be possible. One design decision, probably in more cases, is whether to have transparent caching in the sotrage implementation, or whether to trigger caching explicitly in the caller (which may help to not make the storage implementation even bigger than it is now). (5) OWL/RDF export take time, but mostly depending on the export settings of your site. The result could be cached internally in a similar way that page-content is cached. External caches could be configured to cache RDF as well. Yet this is not to be neglected, since a number of Semantic Web crawlers and misguided RSS-spiders regularly visit the RDF. (6) Special:Browse is not inefficient, but as it is a specialised form of "What links here" it also faces similar performance issues. Finally, SMW needs practically no time to load if it is not strictly needed. So enabling it does hardly slow down the wiki for services that need no SMW. Summing up, the required caching facilities in order of relevance would probably be: (2) [Queries], (4) [Specials], (5) [OWL/RDF]. I do not think that the other parts need to much care, but analysing the current profileinfo may yield more insights. Concerning (2), which is by far the most severe performance problem, we have included many ways of restricting queries, so that large sites can always switch off features until it works again (SMW is still useful without very complex queries). At the moment this is the suggested procedure for large sites, and we can also offer some support for helping such sites to not experience major problems (things of course also depend a lot on the wiki's actual structure). Best regards, Markus > > Thank you, > > Sergey. > > On Dec 14, 2007 1:12 PM, Markus Krötzsch <[EMAIL PROTECTED]> wrote: > > On Freitag, 14. Dezember 2007, Sergey Chernyshev wrote: > > > Markus, can you elaborate on three values - what's the difference > > > > between > > > > > SOME and FULL? > > > > FULL is what used to be "true" in 1.0 (default) > > NONE is what used to be "false" in all versions > > SOME is new, but does basically what 0.7 did earlier. > > > > So SOME only considers redirects for pages that appear directly in the > > query. > > For example, assume "r1" and "r2" are redirects to "p". Then asking > > for "[[property::r1]]" yields the same results as asking > > for "[[property::p]]" or "[[property::r1]]". > > > > This is not too hard to do. Now FULL evaluates redirects even when > > joining subqueries or asking for categories. As an example, assume that > > in addition > > to the above there is a page "q" with annotation "[[property::r1]]", and > > assume further that r2 is in Category2 and that p is in Category3. Then > > each > > of the following queries contains "q" in its result list: > > > > * <ask>[[property::<q>[[Category:Category3]]</q>]]</ask> > > * <ask>[[property::<q>[[Category:Category2]]</q>]]</ask> > > > > Neither would work with SOME only. But as you can imagine, doing these > > additional considerations about redirects at query time consumes a lot of > > additional time (in particular since we use MW's redirect table that is > > not > > even optimised for these kind of games). > > > > If you make sure that properties do not point to redirects, and that > > redirects > > have no categories or properties, then SOME should always suffice (I > > think it > > was discussed earlier to have a Special page for that kind of > > maintenance). > > > > -- Markus > > > > > Sergey > > > > > > On Dec 14, 2007 7:27 AM, Markus Krötzsch <[EMAIL PROTECTED]> > > > > wrote: > > > > On Freitag, 14. Dezember 2007, cnit wrote: > > > > > > Indeed. This was fixed now in SVN. > > > > > > > > > > Thank you! It works!! > > > > > And with #ask SMW has become much more powerful! > > > > > > > > > > One tought - I wonder if $smwgQEqualitySupport should be true by > > > > > default.. It seems to speeds up a little when set false. But of > > > > course > > > > > > > that's the matter of tuning and can be done by the users. Maybe > > > > > just > > > > a > > > > > > > notice in INSTALL file would be helpful. But anyway - not much of > > > > > problem even if leave it "as is". > > > > > > > > I agree, it should not be "true". I think I will make it > > > > three-valued, since > > > > only part of the feature really slows down the querying. So the > > > > values would > > > > be something like NONE, SOME, and FULL. This should happen before > > > > 1.0. > > > > > > > > Markus > > > > > > > > > > > > -- > > > > Markus Krötzsch > > > > Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe > > > > phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 > > > > [EMAIL PROTECTED] www http://korrekt.org > > > > ------------------------------------------------------------------------- > > > > > > SF.Net email is sponsored by: > > > > Check out the new SourceForge.net Marketplace. > > > > It's the best place to buy or sell services > > > > for just about anything Open Source. > > > > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketpl > > > > > >ace _______________________________________________ > > > > Semediawiki-devel mailing list > > > > Semediawiki-devel@lists.sourceforge.net > > > > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel > > > > -- > > Markus Krötzsch > > Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe > > phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 > > [EMAIL PROTECTED] www http://korrekt.org > > > > ------------------------------------------------------------------------- > > SF.Net email is sponsored by: > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services > > for just about anything Open Source. > > > > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketpl > >ace _______________________________________________ > > Semediawiki-devel mailing list > > Semediawiki-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel -- Markus Krötzsch Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 [EMAIL PROTECTED] www http://korrekt.org
signature.asc
Description: This is a digitally signed message part.
------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel