On Dienstag, 6. November 2007, Sergey Chernyshev wrote: > It seems that SMW_refreshData gets slower with growing size of the dataset. > > I didn't do much of troubleshooting of the issue, but first 50000 pages > from my dataset were processed faster then second 50000 pages.
I noticed the same on our servers, and I suspect some memory leak to account for that. It is possible that MediaWiki is part of the reason -- we had a similar problem some time ago and it turned out that MediaWiki's link-cache had no size limit (so batch-processing 1Mio pages really generated a large array in memory). Similar caches may be the reason for the renewed slowdown, but we were unable to analyse this issue in detail. Anyway, the MW version is an important part of debugging here. > > I'm going to start upgrade over for RC2 and will try to look at it in terms > of speed of the process, but I think there might be a reason for it in some > indexes getting bigger with more data (which can be avoided by dropping > indexes prior to refresh and rebuilding them right after) or MySQL not > liking that many temporary tables created so rapidly. I would rather suspect the PHP side to be the reason, but on enever knows. I do not expect changes between SMW1.0-RCs. Basically the refresh process did not change much for a long time, but the speed issues only occurred recently (again suggesting that some change in MW may be the reason). SMW also has some unbound caches, but these are for properties and should hardly get large enough on current wikis to be relevant here. > > Also, I'm wondering if parts of the dataset can be processed in parallel? > it seems that single run of the script doesn't load CPU that much and > alternates between PHP and MySQL processes which is not optimal for > multi-processor boxes where these loads can be spread across all the CPUs. Possibly, but refreshing often is a low-priority task since the wiki should be usable during refreshing. So it might be an advantage if it works in the background without eating too much resources at a time (which by the above observation is probably not really the case either ;-). Markus > > Sergey -- Markus Krötzsch Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362 fax +49 (0)721 608 5998 [EMAIL PROTECTED] www http://korrekt.org
pgpTnabJvqQ2V.pgp
Description: PGP signature
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel