On Dienstag, 6. November 2007, Sergey Chernyshev wrote:
> It seems that SMW_refreshData gets slower with growing size of the dataset.
>
> I didn't do much of troubleshooting of the issue, but first 50000 pages
> from my dataset were processed faster then second 50000 pages.

I noticed the same on our servers, and I suspect some memory leak to account 
for that. It is possible that MediaWiki is part of the reason -- we had a 
similar problem some time ago and it turned out that MediaWiki's link-cache 
had no size limit (so batch-processing 1Mio pages really generated a large 
array in memory). Similar caches may be the reason for the renewed slowdown, 
but we were unable to analyse this issue in detail. Anyway, the MW version is 
an important part of debugging here.

>
> I'm going to start upgrade over for RC2 and will try to look at it in terms
> of speed of the process, but I think there might be a reason for it in some
> indexes getting bigger with more data (which can be avoided by dropping
> indexes prior to refresh and rebuilding them right after) or MySQL not
> liking that many temporary tables created so rapidly.

I would rather suspect the PHP side to be the reason, but on enever knows. I 
do not expect changes between SMW1.0-RCs. Basically the refresh process did 
not change much for a long time, but the speed issues only occurred recently 
(again suggesting that some change in MW may be the reason). SMW also has 
some unbound caches, but these are for properties and should hardly get large 
enough on current wikis to be relevant here.

>
> Also, I'm wondering if parts of the dataset can be processed in parallel?
> it seems that single run of the script doesn't load CPU that much and
> alternates between PHP and MySQL processes which is not optimal for
> multi-processor boxes where these loads can be spread across all the CPUs.

Possibly, but refreshing often is a low-priority task since the wiki should be 
usable during refreshing. So it might be an advantage if it works in the 
background without eating too much resources at a time (which by the above 
observation is probably not really the case either ;-).


Markus

>
>          Sergey



-- 
Markus Krötzsch
Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe
phone +49 (0)721 608 7362        fax +49 (0)721 608 5998
[EMAIL PROTECTED]        www  http://korrekt.org

Attachment: pgpTnabJvqQ2V.pgp
Description: PGP signature

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel

Reply via email to