Petr Bena wrote:
> There is a number of services running on our wikis, such as english
> wikipedia which are for example generating some information and
> frequently update certain page with these data, (statistics etc). That
> makes tons of revisions that will be never needed in future and which
> will be in database with no option to remove them.
> 
> For this reason it would be nice to have an option to create a special
> page on wiki which would have only 1 revision and its data were
> overwritten, so that it wouldn't eat any space in DB. I know it
> doesn't look like a big issue, but there is a number of such pages,
> and bots that update them generate lot of megabytes every month. Bot
> operators could use for example external web server to display these
> data, but these can't be transcluded using templates, like here:
> http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Articles_for_creation/Submi
> ssions
> 
> What do you think about this feature? Is it worth of working on, or
> this isn't a big issue? Keep in mind that even if we have a lot of
> storage, these unneeded data make the dumps larger and processes that
> are parsing the XML files are slower

Hi.

I think this is the wrong approach to take. I agree that adding these
revisions to the database is a silly thing to do.
<https://commons.wikimedia.org/wiki/Template:Toolserver> is usually the
example I use (currently at 44,816 revisions).

For pages such as this, there are a few reasons you might put this on a wiki
as opposed to having a dynamic tool on an external service:

* historical data can be stored and tracked;
* users can see the updated information in their watchlists; and
* the data can be transcluded elsewhere on the wiki.

Depending on the bot and the page, points one and two can have more or less
importance. For the third point (transclusion), I've always thought it would
be neat to be able to transclude content from the Toolserver. I don't think
it's been explored too much, but imagine something like:

{{#toolserver:~user/my_dynamic_tool.py}}

MediaWiki could take this parser tag, fetch the remote content, run it
through the parser/sanitizer, and then cache the page. A purge of the
MediaWiki page would update the data.

Or nowadays, I guess you could do something similar with jQuery.

This creates a reliance on an external tool to be up and running and might
make page load a bit slower, but most of this data is non-critical. As long
as it doesn't block page loading when the external service is down, it isn't
a big deal.

I'm very hesitant to introduce page history truncation or manipulation. We
generally work hard to preserve and protect the integrity of the page
history and we use Special pages for dynamic data. For a lot of these bots
and pages, I agree that we need to re-evaluate them and make smarter
decisions about to present and update the data they provide, but I don't
think changing MediaWiki's page history infrastructure is the best approach,
given the other options available.

MZMcBride



_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to