Eric,

We don't produce dumps of the revision table in sql format because some of
those revisions may be hidden from public view, and even metadata about
them should not be released. We do however publish so-called Adds/Changes
dumps once a day for each wiki, providing stubs and content files in xml of
just new pages and revisions since the last such dump. They lag about 12
hours behind to allow vandalism and such to be filtered out by wiki admins,
but hopefully that's good enough for your needs.  You can find those here:
https://dumps.wikimedia.org/other/incr/

Ariel Glenn
[email protected]

On Tue, Jan 17, 2023 at 6:22 AM Eric Andrew Lewis <
[email protected]> wrote:

> Hi,
>
> I am interested in performing analysis on recently created pages on
> English Wikipedia.
>
> One way to find recently created pages is downloading a meta-history file
> for the English language, and filter through the XML, looking for pages
> where the oldest revision is within the desired timespan.
>
> Since this requires a library to parse through XML string data, I would
> imagine this is much slower than a database query. Is page revision data
> available in one of the SQL dumps which I could query for this use case?
> Looking at the exported tables list
> <https://meta.wikimedia.org/wiki/Data_dumps/What%27s_available_for_download#Database_tables>,
> it does not look like it is. Maybe this is intentional?
>
> Thanks,
> Eric Andrew Lewis
> ericandrewlewis.com
> +1 610 715 8560
> _______________________________________________
> Xmldatadumps-l mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
_______________________________________________
Xmldatadumps-l mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to