Re: [Wikitech-l] WikiXMLArticleIndexer
On Tue, Sep 14, 2010 at 1:04 AM, Jamie Morken jmor...@shaw.ca wrote: Hi all, We have a beta version of the code for reading the XML dump and extracting the article names with their associated images. It is easier to download the imagelinks.sql and the page.sql dumps. Imagelinks contains already a mapping of images used on a page, and page can be used to map page_id to page_namespace and page_title. Bryan ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Get full-protecting article list monthly
2010/9/14 zh...@york.ac.uk: Dear All, May I ask how I can get the full-protecting article lists monthly? I can get the current one by searching lock link. Is that some tools for this? http://en.wikipedia.org/w/api.php?action=querylist=allpagesapprtype=editapprlevel=sysopaplimit=max returns the first 500 (or 5,000 if you're a bot or sysop) pages in the main namespace that only sysops can edit (i.e. that are fully protected). If you're not a privileged user and only get 500 entries, you can use the information in the query-continue tag to get the next 500. Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Get full-protecting article list monthly
On Sep 14 2010, Roan Kattouw wrote: 2010/9/14 zh...@york.ac.uk: Dear All, May I ask how I can get the full-protecting article lists monthly? I can get the current one by searching lock link. Is that some tools for this? http://en.wikipedia.org/w/api.php?action=querylist=allpagesapprtype=editapprlevel=sysopaplimit=max returns the first 500 (or 5,000 if you're a bot or sysop) pages in the main namespace that only sysops can edit (i.e. that are fully protected). If you're not a privileged user and only get 500 entries, you can use the information in the query-continue tag to get the next 500. Roan Kattouw (Catrope) Thanks for this! I am looking for the change of full-protecting articles. May I get the data with time or date something? is that possible? Thanks, Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Get full-protecting article list monthly
If you want we have a toolserver database query service, and generating such data should be easy if you file a request https://jira.toolserver.org/browse/DBQ you should be able to get the data you need. Δ On Tue, Sep 14, 2010 at 8:59 AM, zh...@york.ac.uk wrote: On Sep 14 2010, Roan Kattouw wrote: 2010/9/14 zh...@york.ac.uk: Dear All, May I ask how I can get the full-protecting article lists monthly? I can get the current one by searching lock link. Is that some tools for this? http://en.wikipedia.org/w/api.php?action=querylist=allpagesapprtype=editapprlevel=sysopaplimit=max returns the first 500 (or 5,000 if you're a bot or sysop) pages in the main namespace that only sysops can edit (i.e. that are fully protected). If you're not a privileged user and only get 500 entries, you can use the information in the query-continue tag to get the next 500. Roan Kattouw (Catrope) Thanks for this! I am looking for the change of full-protecting articles. May I get the data with time or date something? is that possible? Thanks, Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] pagesize function problem
Hi, We are having a problem with the front page of our wikipedia mirror. The function {{#ifexpr:{{formatnum:{{PAGESIZE:Wikipedia:Today's featured article/{{#time:F j, Y|R}}150|{{Wikipedia:Today's featured article/{{#time:F j, Y|{{Wikipedia:Today's featured article/{{#time:F j, Y|-1 days}} causes the error Expression error: Unexpected operator to appear on the main page. {#time:F j, Y}} correctly returns the date in format 'June 22, 2010'. However, {{PAGESIZE:Wikipedia:Today's featured article/{{#time:F j, Y, doesn't seem to return anything. Trying PAGESIZE with a normal page, {{PAGESIZE:Paw}}, does not return anything either. Because of the failure of PAGESIZE, ifexpr becomes {{#ifexpr:150|...}} which I believe causes the Expression error: Unexpected operator error. Does anybody have any ideas what is going on with this? Thanks, Brent MediaWiki http://www.mediawiki.org/ 1.16.0 PHP http://www.php.net/ 5.2.8 (apache2handler) MySQL http://www.mysql.com/ 5.1.48-community ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Get full-protecting article list monthly
Better yet, http://toolserver.org/~betacommand/reports/sysopprotecton.txtwhich is updated daily, Δ On Tue, Sep 14, 2010 at 9:08 AM, John Doe phoenixoverr...@gmail.com wrote: If you want we have a toolserver database query service, and generating such data should be easy if you file a request https://jira.toolserver.org/browse/DBQ you should be able to get the data you need. Δ On Tue, Sep 14, 2010 at 8:59 AM, zh...@york.ac.uk wrote: On Sep 14 2010, Roan Kattouw wrote: 2010/9/14 zh...@york.ac.uk: Dear All, May I ask how I can get the full-protecting article lists monthly? I can get the current one by searching lock link. Is that some tools for this? http://en.wikipedia.org/w/api.php?action=querylist=allpagesapprtype=editapprlevel=sysopaplimit=max returns the first 500 (or 5,000 if you're a bot or sysop) pages in the main namespace that only sysops can edit (i.e. that are fully protected). If you're not a privileged user and only get 500 entries, you can use the information in the query-continue tag to get the next 500. Roan Kattouw (Catrope) Thanks for this! I am looking for the change of full-protecting articles. May I get the data with time or date something? is that possible? Thanks, Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] template rendering problems
Hi, We are creating an off-line version of the wikipedia, but we continue to have problems getting templates to render correctly. So far, we've tracked down two sources of the problem. Here is one example: One is when tags are not closed properly. Here is an example from http://en.wikipedia.org/wiki/South_Africa. This is a portion of the page near the top that is part of the Infobox parameters... snip |symbol_type=Coat of arms |image_map=South_Africa_(orthographic_projection).svg |national_motto=''{{unicode|!ke e: ǀxarra ǁke}}''{{spaces|2|}}small([[ǀXam language|ǀXam]])br/Unity In Diversity |national_anthem=[[National anthem of South Africa]] /snip Notice that the small tag is not closed. On our version of the page many of the subsequent tr and td tags are rendered as html entities and this ruins the layout of the page. I'm not sure why this is works in the current on-line version but not ours. (Another example is the Refimprove template--check the latest change by Plastikspork--this makes any page with the Refimprove template broken). We have the same extensions applied as Wikipedia. We also have the wgUseTidy set to true. If you have any ideas about how to troubleshoot this one, I'd appreciate it. Brent ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] template rendering problems-#switch
Hi, We are creating an off-line version of the wikipedia, but we continue to have problems getting templates to render correctly. So far, we've tracked down several sources of the problem. Here is another example: In several templates I've tracked down the problem to a call to the #switch function. The function appears to work in that the correct text is output from the function, but several of the tags following it are converted to html entities. Here is a snippet of the Historical_populations template: includeonly{| class=toccolours {{#ifeq: {{{state|}}} | collapsed | collapsible collapsed | }} style=clear: {{{align|right}}}; width: {{{width|15em}}}; text-align: center; border-spacing: 0; float: {{{align|right}}}; margin: {{ #switch: {{{align|}}} | left = 0 1em 1em 0 | #default = 0 0 1em 1em }}; |- ! colspan={{#ifeq:{{{percentages}}}|off|2|3}} class=navbox-title | span style=font-size:110%;{{{title|Historical populations}}}/span |- style=font-size: 95%; If I remove the #switch and hard-code the default text in (0 0 1em 1em), it works fine. We have the same extensions applied as Wikipedia. We also have the wgUseTidy set to true. If you have any ideas about how to troubleshoot this one, I'd appreciate it. Brent ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Get full-protecting article list monthly
Hi, the link seems not work. best, Zeyi On Sep 14 2010, John Doe wrote: Better yet, http://toolserver.org/~betacommand/reports/sysopprotecton.txtwhich is updated daily, Δ On Tue, Sep 14, 2010 at 9:08 AM, John Doe phoenixoverr...@gmail.com wrote: If you want we have a toolserver database query service, and generating such data should be easy if you file a request https://jira.toolserver.org/browse/DBQ you should be able to get the data you need. Δ On Tue, Sep 14, 2010 at 8:59 AM, zh...@york.ac.uk wrote: On Sep 14 2010, Roan Kattouw wrote: 2010/9/14 zh...@york.ac.uk: Dear All, May I ask how I can get the full-protecting article lists monthly? I can get the current one by searching lock link. Is that some tools for this? http://en.wikipedia.org/w/api.php?action=querylist=allpagesapprtype=editapprlevel=sysopaplimit=max returns the first 500 (or 5,000 if you're a bot or sysop) pages in the main namespace that only sysops can edit (i.e. that are fully protected). If you're not a privileged user and only get 500 entries, you can use the information in the query-continue tag to get the next 500. Roan Kattouw (Catrope) Thanks for this! I am looking for the change of full-protecting articles. May I get the data with time or date something? is that possible? Thanks, Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Keeping record of imported licensed text
On Fri, 10 Sep 2010 23:11:27 +, Dan Nessett wrote: We are currently attempting to refactor some specific modifications to the standard MW code we use (1.13.2) into an extension so we can upgrade to a more recent maintained version. One modification we have keeps a flag in the revisions table specifying that article text was imported from WP. This flag generates an attribution statement at the bottom of the article that acknowledges the import. I don't want to start a discussion about the various legal issues surrounding text licensing. However, assuming we must acknowledge use of licensed text, a legitimate technical issue is how to associate state with an article in a way that records the import of licensed text. I bring this up here because I assume we are not the only site that faces this issue. Some of our users want to encode the attribution information in a template. The problem with this approach is anyone can come along and remove it. That would mean the organization legally responsible for the site would entrust the integrity of site content to any arbitrary author. We may go this route, but for the sake of this discussion I assume such a strategy is not viable. So, the remainder of this post assumes we need to keep such licensing state in the db. After asking around, one suggestion was to keep the licensing state in the page_props table. This seems very reasonable and I would be interested in comments by this community on the idea. Of course, there has to be a way to get this state set, but it seems likely that could be achieved using an extension triggered when an article is edited. Since this post is already getting long, let me close by asking whether support for associating licensing information with articles might be useful to a large number of sites. If so, the perhaps it belongs in the core. The discussion about whether to support license data in the database has settled down. There seems to be some support. So, I think the next step is to determine the best technical approach. Below I provide a strawman proposal. Note that this is only to foster discussion on technical requirements and approaches. I have nothing invested in the strawman. Implementation location: In an extension Permissions: include two new permissions - 1) addlicensedata, and 2) modifylicensedata. These are pretty self-explanatory. Sites that wish to give all users the ability to provide and modify licensing data would assign these permissions to everyone. Sites that wish to allow all users to add licensing data, but restrict those who are allowed to modify it, would give the first permission to everyone and the second to a limited group. Database schema: Add a licensing table to the db with the following columns - 1) revision_or_image, 2) revision_id, 3) image_id, 4) content_source, 5) license_id, 6) user_id. The first three columns identify the revision or image to which the licensing data is associated. I am not particularly adept with SQL, so there may be a better way to do this. The content_source column is a string that is a URL or other reference that specifies the source of the content under license. The license_id identifies the specific license for the content. The user_id identifies the user that added the licensing information. The user_id may be useful if a site wishes to allow someone who added the licensing information to delete or modify it. However, there are complications with this. Since IP addresses are easily spoofed, it would mean this entry should only be valid for logged in users. Add a license table with the following columns - 1) license_id, 2) license_text, 3) license name and 4) license_version. The license_id in the licensing table references rows in this table. One complication is when a page or image is reverted, the licensing table must be modified to reflect the current state. Data manipulation: The extension would use suitable hooks to insert, modify and render licensing data. Insertion and modification would probably use a relevant Edit Page or Article Management hook. Rendering would probably use a Page Rendering Hook. Page rendering: You probably don't want to dump licensing data directly onto a page. Instead, it is preferable to output a short licensing statement like: Content on this page uses licensed content. For details, see licensing data. The phrase licensing data would be a link to a special page that accesses the licensing table and displays the license data associated with the page. -- -- Dan Nessett ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] template rendering problems
On Tue, Sep 14, 2010 at 10:52 AM, Brent Palmer b...@brentopalmer.com wrote: We have the same extensions applied as Wikipedia. We also have the wgUseTidy set to true. If you have any ideas about how to troubleshoot this one, I'd appreciate it. Did you install Tidy or just turn the setting on? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Get full-protecting article list monthly
http://toolserver.org/~betacommand/reports/sysopprotecton.txt try that On Tue, Sep 14, 2010 at 1:35 PM, zh...@york.ac.uk wrote: Hi, the link seems not work. best, Zeyi On Sep 14 2010, John Doe wrote: Better yet, http://toolserver.org/~betacommand/reports/sysopprotecton.txtwhichhttp://toolserver.org/%7Ebetacommand/reports/sysopprotecton.txtwhichis updated daily, Δ On Tue, Sep 14, 2010 at 9:08 AM, John Doe phoenixoverr...@gmail.com wrote: If you want we have a toolserver database query service, and generating such data should be easy if you file a request https://jira.toolserver.org/browse/DBQ you should be able to get the data you need. Δ On Tue, Sep 14, 2010 at 8:59 AM, zh...@york.ac.uk wrote: On Sep 14 2010, Roan Kattouw wrote: 2010/9/14 zh...@york.ac.uk: Dear All, May I ask how I can get the full-protecting article lists monthly? I can get the current one by searching lock link. Is that some tools for this? http://en.wikipedia.org/w/api.php?action=querylist=allpagesapprtype=editapprlevel=sysopaplimit=max returns the first 500 (or 5,000 if you're a bot or sysop) pages in the main namespace that only sysops can edit (i.e. that are fully protected). If you're not a privileged user and only get 500 entries, you can use the information in the query-continue tag to get the next 500. Roan Kattouw (Catrope) Thanks for this! I am looking for the change of full-protecting articles. May I get the data with time or date something? is that possible? Thanks, Zeyi ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] template rendering problems
Ahh, good question. But yes, we did install Tidy. It's the windows version though if that makes a difference. We are reasonably sure that it's working (it definitely works independently of MW) because a few small rendering problems were fixed, but the major ones described here weren't affected. We aren't exactly sure where or when Tidy is invoked (after each template is parsed?). We haven't yet tried to determine if it is supposed to be cleaning up something that it's not. Thanks! Brent On 09/14/2010 02:37 PM, OQ wrote: On Tue, Sep 14, 2010 at 10:52 AM, Brent Palmerb...@brentopalmer.com wrote: We have the same extensions applied as Wikipedia. We also have the wgUseTidy set to true. If you have any ideas about how to troubleshoot this one, I'd appreciate it. Did you install Tidy or just turn the setting on? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] [Announce]: Mark Bergsma promotion to Operations Engineer Programs Manager
Please join me in congratulating Mark Bergsma on his promotion last week to Operations EPM. Mark has been a volunteer since 2004, and a paid Network Engineer on our team since August 2006. He's been helping us with our extreme scaling issues (by debugging and tuning our Squid setup, creating our Netherlands caching center, and generally developing our network strategy) since the very beginning. For some time now Mark has been unofficially in charge of managing the entire Ops Team's deliverables including designing and implementing our new Primary Data Center in Ashburn, VA, and the other Ops activities mentioned at http://www.mediawiki.org/wiki/WMF_Projects http://www.mediawiki.org/wiki/WMF_Projects. Mark has expressed an interest in gaining some experience with people management skills as a logical next step in his career, and to that end we will gradually add direct reports under Mark over the next year, starting with the Data Center Ops crew. He will continue to report to me until we hire a Director of Technical Operations. I know you will do all you can to support Mark in his new role. Danese Cooper CTO, Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Announce]: Mark Bergsma promotion to Operations Engineer Programs Manager
On Tue, Sep 14, 2010 at 5:18 PM, Danese Cooper dcoo...@wikimedia.orgwrote: Please join me in congratulating Mark Bergsma on his promotion last week to Operations EPM. Mark has been a volunteer since 2004, and a paid Network Engineer on our team since August 2006. He's been helping us with our extreme scaling issues (by debugging and tuning our Squid setup, creating our Netherlands caching center, and generally developing our network strategy) since the very beginning. For some time now Mark has been unofficially in charge of managing the entire Ops Team's deliverables including designing and implementing our new Primary Data Center in Ashburn, VA, and the other Ops activities mentioned at http://www.mediawiki.org/wiki/WMF_Projects http://www.mediawiki.org/wiki/WMF_Projects. Mark has expressed an interest in gaining some experience with people management skills as a logical next step in his career, and to that end we will gradually add direct reports under Mark over the next year, starting with the Data Center Ops crew. He will continue to report to me until we hire a Director of Technical Operations. Congrats Mark!!! -aude PS - let me/us know when you are visiting Ashburn... we should have you and other ops staff to a DC meetup can treat you to beer :) I know you will do all you can to support Mark in his new role. Danese Cooper CTO, Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Proposal: reversion collapsing in edit history
Roan Kattouw roan.kattouw at gmail.com writes: The only common factor between collapsing reversions and hiding minor and/or bot edits is the fact that you're hiding things from the history view. Yes, it is the UI which could be reused. the other requires the minor/bot flags to be added to that table. Just checking for bot user group would be, while not ideal, still acceptable. Or is the table join required for that too costly? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l