Dear Ariel, I added a function to WP-MIRROR 0.7 what cleans up <title>. It removes the following namespace words from page titles:
Category: 24575 Template: 15082 Wikipedia: 4072 MediaWiki: 520 Help: 108 Module: 27 The number beside each namespace word indicates the number of <title>s found in the dump file `simplewiki-20140220-pages-articles.xml.bz2' that were cleaned up. MediaWiki now does a much better job of rendering the mirror. Still, it would be nice if the dump files could be fixed. Sincerely Yours, Kent On 2/21/14, wp mirror <wpmirror...@gmail.com> wrote: > Dear Ariel, > > 0) Problem > > The dump files contain a great number of pages where the page_title > contains the namespace. These page_titles are imported (via mwxml2sql > and wp-mirror) into my database. One consequence: most templates are > not expanded by MediaWiki; rather, they are rendered as red-links. > > 1) Example > > (shell)$ rsync > ftpmirror.your.org::wikimedia-dumps/simplewiki/20140220/simplewiki-20140220-pages-articles.xml.bz2 > . > (shell)$ bunzip2 simplewiki-20140220-pages-articles.xml.bz2 > (shell)$ cat simplewiki-20140220-pages-articles.xml | grep "Template:" | > head > <title>Template:Stub</title> > <title>Template:NPOV</title> > <title>Template:Disputed</title> > <title>Template:Disambiguation</title> > <title>Template:TOC</title> > <title>Template:Uw-test1</title> > <title>Template:1911</title> > <title>Template:Please do not change this line</title> > <title>Template:Solar System</title> > <title>Template:Months</title> > (shell)$ cat simplewiki-20140220-pages-articles.xml | grep > "<title>Category:" | head > <title>Category:Computer science</title> > <title>Category:Sports</title> > <title>Category:Athletics</title> > <title>Category:Body parts</title> > <title>Category:Tools</title> > <title>Category:Movies</title> > <title>Category:Grammar</title> > <title>Category:Mathematics</title> > <title>Category:Alphabet</title> > <title>Category:Countries</title> > (shell)$ cat simplewiki-20140220-pages-articles.xml | grep > "<title>Help:" | head > <title>Help:How to change pages</title> > <title>Help:Minor change</title> > <title>Help:User settings</title> > <title>Help:Writing articles for Wikipedia</title> > <title>Help:Contents</title> > <title>Help:Revert a page</title> > <title>Help:Editing</title> > <title>Help:How to use images</title> > <title>Help:How to write simple English articles</title> > <title>Help:User preferences help</title> > > 2) Solution > > I would like your advice as to where the solution should be attempted: > > a) Should the dump file generating process be fixed? > b) Should `mwxml2sql' be altered to edit the <title> content? > c) Should `wp-mirror' be altered to edit the <title> content? > d) Should `wp-mirror' be able to detect and correct such `page_title' > content in the underlying database? > > Sincerely Yours, > Kent > > On 2/21/14, gnosygnu <gnosy...@gmail.com> wrote: >> Hi. I believe the problem is with the import of the [[Template]] pages >> into the page table >> >> Your SQL output shows the following: >> >> page_title: Template:Ndash >> >> Instead, the page_title should just be "Ndash", not "Template:Ndash". >> Note that the page is already marked as page_namespace = 10. Also, >> note that no other namespace (Category, Help, Project, etc) will have >> a "page_title" with the namespace name in front of it. i.e.: Category >> "Earth" will be in the page table with a page_title of "Earth" not >> "Category:Earth" >> >> MediaWiki has code that takes {{Template:A}} and makes it effectively >> the same as {{A}}. Note that this is just regular page transclusion >> via namespace. You can do "{{Category:Earth}}" and it will transclude >> the contents of the page "Category:Earth" >> >> Hope this helps. >> >> >> On Fri, Feb 21, 2014 at 5:21 PM, wp mirror <wpmirror...@gmail.com> wrote: >>> Dear Sir or Madam, >>> >>> I am not sure to which person or list I should address this question to. >>> >>> 0) Objective >>> >>> I am in the process of building DEB packages for: WP-MIRROR 0.7, the >>> latest development version of MediaWiki 1.23, and a set of MediaWiki >>> extensions. >>> >>> The objective is to this: That a page rendered by a mirror should >>> look the same a that page rendered by the WMF site. >>> >>> 1) Problem >>> >>> In the process of testing mirrors, I noticed that many templates were >>> not expanding, and instead being rendered as red-links. >>> >>> 2) Example >>> >>> To illustrate, consider the Ndash template, which appears on many >>> pages such as <http://simple.wikipedia.org/wiki/August>. It appears >>> in the underlying database: >>> >>> mysql> select page_id,page_title,rev_len,old_text from >>> simplewiki.page,simplewiki.revision,simplewiki.text where >>> page_id=rev_page and rev_text_id=old_id and page_title like >>> 'Template:Ndash' limit 10\G >>> *************************** 1. row *************************** >>> page_id: 132985 >>> page_title: Template:Ndash >>> rev_len: 65 >>> old_text: –<noinclude> >>> [[Category:Formatting templates]] >>> </noinclude> >>> 1 row in set (0.25 sec) >>> >>> 3) Special:ExpandTemplates >>> >>> To test the above example ``Template:Ndash'', I use >>> Special:ExpandTemplates. >>> >>> 3.1) Input text >>> >>> Today is the {{CURRENTDAY}} day.</br> >>> This server is {{SERVER}}, script path {{SCRIPTPATH}}, current MW >>> version {{CURRENTVERSION}}.</br> >>> This site is {{SITENAME}}. Full page name is {{FULLPAGENAME}}.</br> >>> <table> >>> <tr><th>Template</th><th>Expanded</th><th>page_id</th><th>rev_len</th></tr> >>> <tr><td>Ndash</td><td>{{Ndash}}</td><td>{{PAGEID: >>> Ndash}}</td><td>{{PAGESIZE: Ndash}}</td></tr> >>> <tr><td>Template:Ndash</td><td>{{Template:Ndash}}</td> >>> <td>{{PAGEID: Template:Ndash}}</td><td>{{PAGESIZE: >>> Template:Ndash}}</td></tr> >>> <tr><td>Template:Template:Ndash</td><td>{{Template:Template:Ndash}}</td> >>> <td>{{PAGEID: Template:Template:Ndash}}</td><td>{{PAGESIZE: >>> Template:Template:Ndash}}</td></tr> >>> </table> >>> >>> 3.2) <http://simple.wikipedia.site/wiki/Special:ExpandTemplates> Preview >>> >>> Here is the result from the WMF site: >>> >>> Today is the 21 day. >>> This server is //simple.wikipedia.org, script path /w, current MW >>> version 1.23wmf14 (f8b9201). >>> This site is Wikipedia. Full page name is My template. >>> Template Expanded page_id rev_len >>> Ndash - 0 0 >>> Template:Ndash - 132985 65 >>> Template:Template:Ndash Template:Template:Ndash 0 0 >>> >>> Both {{Ndash}} and {{Template:Ndash}} expand as expected. >>> >>> 3.3) <http://simple.wikipedia.site/wiki/Special:ExpandTemplates> Preview >>> >>> Here is the result from the mirrored site: >>> >>> Today is the 21 day. >>> This server is http://simple.wikipedia.site, script path /w, current >>> MW version 1.23alpha. >>> This site is simplewiki. Full page name is My template. >>> Template Expanded page_id rev_len >>> Ndash Template:Ndash 0 0 >>> Template:Ndash Template:Ndash 0 0 >>> Template:Template:Ndash - 132985 65 >>> >>> Only {{Template:Template:Ndash}} expands! >>> >>> 4) Question >>> >>> Why do I need to prepend an extra ``Template:'' to make the templates >>> work for the mirror? >>> >>> Better yet: Could someone tell me where in the MediaWiki core I can >>> find the code that takes the template (e.g. {{Ndash}} or >>> {{Template:Ndash}}) and converts it into an SQL query that SELECTs the >>> template expansion from the underlying database? >>> >>> Sincerely Yours, >>> Kent >>> >>> _______________________________________________ >>> Xmldatadumps-l mailing list >>> Xmldatadumps-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l >> > _______________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l