On Freitag, 14. Dezember 2007, Sergey Chernyshev wrote:
> Got it - if it'll speed up the process, that'll be great. Currently SMW on
> top of MW runs significantly slower then just MW which is not very good
> because it means that SMW+MW can't scale as good as MW alone.
>
> Can you describe in a couple of paragraphs how SMW data and queries are
> getting cached and how that cache is being invalidated, what works on the
> fly and what is served from parser cache.
>
> I understand it's a lot to describe, but projects with massive amount of
> data and traffic, performance can be a big show-stopper - we picked MW for
> one of our projects because of Wikipedia performance example and
> predictability and I hope that it's not too distant for SMW to inherit
> these qualities, but I'd like to understand the overall picture.

Yes, agreed. Of course we have always designed basic algorithms with regards 
to performance and scalability, and especially tried to pick features based 
on this aspect. On the other hand, caching is significantly under-developed 
in SMW as it is, since it mainly uses the existing MW caches where 
applicable. There are various types of operations that are relevant to 
performance, and each can probably be optimised/cached independently:

(1) Basic page display -- by far the most common operation.
(2) Query answering, inline and on Special:Ask
(3) Annotation parsing and page formatting.
(4) Maintenance specials such as Special:Properties.
(5) OWL/RDF export.
(6) Browsing special Special:Browse

I will sketch performance issues for each of those. For actual numbers, see 
http://ontoworld.org/profileinfo.php to find out how severe each operation is 
on ontoworld.org.

(1) is clearly the main operation, and for existing pages SMW merely uses MW's 
parser/page caches. No mechanism for cache invalidation exists, but MW 
regularly updates page caches. This allows outdated inline queries but gives 
us good hope for basic scalability in large environments.  Especially SMW 
does not hook into any operations that happen when reproducing parser cached 
pages. Even the Factbox comes from the parser cache (which is why we cannot 
readily translate it to the user's language as MW does for categories).

(2) Query answering is done without any caching, and this is clearly a 
problem. While inline queries are computed only once and stored in the parser 
cache afterwards, Special:Ask has no caching facility at all. This needs to 
change in the future. Targetted cache invalidation might still be difficult 
and it is not clear whether the effort is needed (one could enable manual 
cache clearing like for pages). A new query cache -- design, architecture and 
implementation -- is needed here.

(3) Page formatting uses very few additional DB calls, and mainly works on the 
wiki source code that was already retrieved anyway. It has no major 
performance impact (see smwfParserHook in the profile).

(4) Maintenance special can be slow, but have been designed to allow the 
caching mechanism that MW uses for its maintenance specials. This is not 
implemented, but it would be possible. One design decision, probably in more 
cases, is whether to have transparent caching in the sotrage implementation, 
or whether to trigger caching explicitly in the caller (which may help to not 
make the storage implementation even bigger than it is now).

(5) OWL/RDF export take time, but mostly depending on the export settings of 
your site. The result could be cached internally in a similar way that 
page-content is cached. External caches could be configured to cache RDF as 
well. Yet this is not to be neglected, since a number of Semantic Web 
crawlers and misguided RSS-spiders regularly visit the RDF.

(6) Special:Browse is not inefficient, but as it is a specialised form 
of "What links here" it also faces similar performance issues.

Finally, SMW needs practically no time to load if it is not strictly needed. 
So enabling it does hardly slow down the wiki for services that need no SMW. 


Summing up, the required caching facilities in order of relevance would 
probably be: (2) [Queries], (4) [Specials], (5) [OWL/RDF]. I do not think 
that the other parts need to much care, but analysing the current profileinfo 
may yield more insights. Concerning (2), which is by far the most severe 
performance problem, we have included many ways of restricting queries, so 
that large sites can always switch off features until it works again (SMW is 
still useful without very complex queries). At the moment this is the 
suggested procedure for large sites, and we can also offer some support for 
helping such sites to not experience major problems (things of course also 
depend a lot on the wiki's actual structure).

Best regards,

Markus

>
> Thank you,
>
>               Sergey.
>
> On Dec 14, 2007 1:12 PM, Markus Krötzsch <[EMAIL PROTECTED]> wrote:
> > On Freitag, 14. Dezember 2007, Sergey Chernyshev wrote:
> > > Markus, can you elaborate on three values - what's the difference
> >
> > between
> >
> > > SOME and FULL?
> >
> > FULL is what used to be "true" in 1.0 (default)
> > NONE is what used to be "false" in all versions
> > SOME is new, but does basically what 0.7 did earlier.
> >
> > So SOME only considers redirects for pages that appear directly in the
> > query.
> > For example, assume "r1" and "r2" are redirects to "p". Then asking
> > for "[[property::r1]]" yields the same results as asking
> > for "[[property::p]]" or "[[property::r1]]".
> >
> > This is not too hard to do. Now FULL evaluates redirects even when
> > joining subqueries or asking for categories. As an example, assume that
> > in addition
> > to the above there is a page "q" with annotation "[[property::r1]]", and
> > assume further that r2 is in Category2 and that p is in Category3. Then
> > each
> > of the following queries contains "q" in its result list:
> >
> > * <ask>[[property::<q>[[Category:Category3]]</q>]]</ask>
> > * <ask>[[property::<q>[[Category:Category2]]</q>]]</ask>
> >
> > Neither would work with SOME only. But as you can imagine, doing these
> > additional considerations about redirects at query time consumes a lot of
> > additional time (in particular since we use MW's redirect table that is
> > not
> > even optimised for these kind of games).
> >
> > If you make sure that properties do not point to redirects, and that
> > redirects
> > have no categories or properties, then SOME should always suffice (I
> > think it
> > was discussed earlier to have a Special page for that kind of
> > maintenance).
> >
> > -- Markus
> >
> > >            Sergey
> > >
> > > On Dec 14, 2007 7:27 AM, Markus Krötzsch <[EMAIL PROTECTED]>
> >
> > wrote:
> > > > On Freitag, 14. Dezember 2007, cnit wrote:
> > > > > > Indeed. This was fixed now in SVN.
> > > > >
> > > > > Thank you! It works!!
> > > > > And with #ask SMW has become much more powerful!
> > > > >
> > > > > One tought - I wonder if $smwgQEqualitySupport should be true by
> > > > > default.. It seems to speeds up a little when set false. But of
> >
> > course
> >
> > > > > that's the matter of tuning and can be done by the users. Maybe
> > > > > just
> >
> > a
> >
> > > > > notice in INSTALL file would be helpful. But anyway - not much of
> > > > > problem even if leave it "as is".
> > > >
> > > > I agree, it should not be "true". I think I will make it
> > > > three-valued, since
> > > > only part of the feature really slows down the querying. So the
> > > > values would
> > > > be something like NONE, SOME, and FULL. This should happen before
> > > > 1.0.
> > > >
> > > > Markus
> > > >
> > > >
> > > > --
> > > > Markus Krötzsch
> > > > Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe
> > > > phone +49 (0)721 608 7362        fax +49 (0)721 608 5998
> > > > [EMAIL PROTECTED]        www  http://korrekt.org
> >
> > -------------------------------------------------------------------------
> >
> > > > SF.Net email is sponsored by:
> > > > Check out the new SourceForge.net Marketplace.
> > > > It's the best place to buy or sell services
> > > > for just about anything Open Source.
> >
> > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketpl
> >
> > > >ace _______________________________________________
> > > > Semediawiki-devel mailing list
> > > > Semediawiki-devel@lists.sourceforge.net
> > > > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
> >
> > --
> > Markus Krötzsch
> > Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe
> > phone +49 (0)721 608 7362        fax +49 (0)721 608 5998
> > [EMAIL PROTECTED]        www  http://korrekt.org
> >
> > -------------------------------------------------------------------------
> > SF.Net email is sponsored by:
> > Check out the new SourceForge.net Marketplace.
> > It's the best place to buy or sell services
> > for just about anything Open Source.
> >
> > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketpl
> >ace _______________________________________________
> > Semediawiki-devel mailing list
> > Semediawiki-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel



-- 
Markus Krötzsch
Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe
phone +49 (0)721 608 7362        fax +49 (0)721 608 5998
[EMAIL PROTECTED]        www  http://korrekt.org

Attachment: signature.asc
Description: This is a digitally signed message part.

-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services
for just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel

Reply via email to