Re: [Wikitech-l] Extensions in SVN looking for a maintainer
2010/2/15 Ævar Arnfjörð Bjarmason ava...@gmail.com: Domas has also complained that it eats up resources. Is this something that can conceivably be fixed in it or is it just inherent in anything that calls the parser from an extension tag and will thus need parser fixups to get anywhere? IIRC Domas was complaining about {{cite}} *templates* and their complexity. Their inefficiency was unrelated to the Cite extension AFAIK. Roan Kattouw (Catrope) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Extensions in SVN looking for a maintainer
Thanks, Avar. Cite: no action taken in Bugzilla. Newuserlog: has been removed. Can unfortunately not close components. CrossNamespaceLinks: added Avar as maintainer Desysop: has been removed. See above. Espionage: had one closed issue; reassigned to CheckUser and component deleted. Eval: added Avar as maintainer. PageCSS: added Avar as maintainer. Thanks! Siebrand -Original Message- From: wikitech-l-boun...@lists.wikimedia.org [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Ævar Arnfjörð Bjarmason Sent: Monday, February 15, 2010 12:24 AM To: Wikimedia developers Cc: MediaWiki announcements and site admin list Subject: Re: [Wikitech-l] Extensions in SVN looking for a maintainer On Fri, Feb 12, 2010 at 19:00, Siebrand Mazeland s.mazel...@xs4all.nl wrote: List of extensions used by Wikimedia without a Bugzilla maintainer: [..] ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] importing enwiki into local database
On Sun, Feb 14, 2010 at 7:34 PM, Marco Schuster ma...@harddisk.is-a-geek.org wrote: What about turning wgUseTidy off for some time? The doctype that we serve is XHTML, and various AJAX tools rely on being able to parse the DOM tree as an XML document. But there are certain valid wikitext constructions that are ''guaranteed'' to generate invalid XML without tidy, because of mediawiki bugs. For example, putting a list inside a table cell (bug 17486). So tidy seems to be a requirement for the time being. I hope that, before the doctype is changed to html5, a substantial grace period is given for people to change to an HTML5 parser in their javascript code. One high-profile use case here is the Twinkle script. - Carl ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] importing enwiki into local database
On Mon, Feb 15, 2010 at 2:19 PM, Carl (CBM) cbm.wikipe...@gmail.com wrote: I hope that, before the doctype is changed to html5, a substantial grace period is given for people to change to an HTML5 parser in their javascript code. We will continue with well-formed XML output for the foreseeable future for exactly this reason. We don't need to stop emitting well-formed XML to use HTML5. I have tested trunk defaults ($wgHtml5 = true; $wgWellFormedXml = true;) with the Python SAX parser and it parses pages correctly, so no tools should be broken. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] User-Agent:
Hi! from now on specific per-bot/per-software/per-client User-Agent header is mandatory for contacting Wikimedia sites. Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] User-Agent:
On Mon, Feb 15, 2010 at 8:54 PM, Domas Mituzas midom.li...@gmail.com wrote: Hi! from now on specific per-bot/per-software/per-client User-Agent header is mandatory for contacting Wikimedia sites. Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l In that case should we tweak the MediaWiki user agent to serve something more unique than MediaWiki/version? -Chad ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] More dump problems?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi guys, Just wanted to be sure this got checked out. From #mediawiki: Manny hi. out of curiosity, are there any known issues with the recent batch of dumps? I noticed that some supposedly completed dumps seem to have ended with Please provide a User-Agent header Domas banned all UA-less requests earlier tonight, so it didn't seem random or nonsensical. Thanks, - -Mike -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkt5/jgACgkQst0AR/DaKHtmbwCfZtbjmwNbvOXF/u4/x5j95o7V cZEAn0LnBuhST3BpaSKSsy6D4jaas6NH =yvuR -END PGP SIGNATURE- ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] User-Agent:
Hello, Am Dienstag 16 Februar 2010 03:06:49 schrieb Steve Summit: Is it permissible to send User-Agent: x why is it so hard to set User-Agent: mytoolname/version mym...@mail.invalid ? (you can forgo the mail if you paranoid) It's clean, fast and good. Sincerly, DaB. -- wp-blog.de signature.asc Description: This is a digitally signed message part. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] User-Agent:
Hi Steve, But why? Because we need to identify malicious behavior. (This just broke one of my bots.) Are the details of this policy discussed anywhere? I don't know. Probably. We always told people to specify User-Agent, just the check was broken. Is it permissible to send User-Agent: x thus providing precisely the same amount of information as if not supplying the header at all? No, you clearly miss very simple idea that with such user-agent you clearly identify yourself as malicious, whereas when you don't specify, you're either malicious or ignorant. Do note, we're good at detecting spoofed user-agents too, so if your bots disguise as MSIE or Firefox or any other regular browser, your behavior is seen as malicious. We do not like malicious behavior. Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] User-Agent:
Domas wrote: from now on specific per-bot/per-software/per-client User-Agent header is mandatory for contacting Wikimedia sites. Oh, my. And not just to be a bot, or to edit the site manually, but even to view it. You can't even fetch a single, simple page now without supplying that header. If this has been discussed to death elsewhere and represents some bizarrely-informed consensus, I'll try to spare this list my belated rantings, but this is a terrible, terrible idea. Relying on User-Agent represents the very antithesis of [[Postel's Law]], a rock-solid principle o which the Internet (used to be) based. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] User-Agent:
Steve, If this has been discussed to death elsewhere and represents some bizarrely-informed consensus, I'll try to spare this list my belated rantings, but this is a terrible, terrible idea. Relying on User-Agent represents the very antithesis of [[Postel's Law]], a rock-solid principle o which the Internet (used to be) based. RFC2616: 14.43 User-Agent The User-Agent request-header field contains information about the user agent originating the request. This is for statistical purposes, the tracing of protocol violations, and automated recognition of user agents for the sake of tailoring responses to avoid particular user agent limitations. User agents SHOULD include this field with requests. The field can contain multiple product tokens (section 3.8) and comments identifying the agent and any subproducts which form a significant part of the user agent. By convention, the product tokens are listed in order of their significance for identifying the application. User-Agent = User-Agent : 1*( product | comment ) Example: User-Agent: CERN-LineMode/2.15 libwww/2.17b3 RFC2119: 3. SHOULD This word, or the adjective RECOMMENDED, mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course. I guess you just found one more implication to carefully weight before not specifying U-A. Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] User-Agent:
Domas wrote: Hi Steve, But why? Because we need to identify malicious behavior. You're trying to detect / guard against malicious behavior using *User-Agent*?? Good grief. Have fun with the whack-a-mole game, then. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] User-Agent:
Hi! You're trying to detect / guard against malicious behavior using *User-Agent*?? Good grief. Have fun with the whack-a-mole game, then. Thanks! I'm relatively new to this all operations game, so I'm obsessed about graphs and whack-a-mole :) Cheers, Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] User-Agent:
On 02/15/2010 05:54 PM, Domas Mituzas wrote: Hi! from now on specific per-bot/per-software/per-client User-Agent header is mandatory for contacting Wikimedia sites. Two questions: Was there some urgent production impact that required doing this with no notice? Was any impact analysis done on this? Given Wikipedia's mission, we can't be as casual about rejecting traffic as a commercial site would be. If a commercial site accidentally gets rid of some third-world traffic running behind a shoddy ISP, it's no loss; nobody wants to advertise to them anyhow. But for us, those are the people who gain the most from being able to reach us. William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] User-Agent:
Hi! Was there some urgent production impact that required doing this with no notice? Actually we had User-Agent header requirement for ages, it just failed to do what it had to do for a while. Consider this to be a bugfix. Was any impact analysis done on this? Yup! Given Wikipedia's mission, we can't be as casual about rejecting traffic as a commercial site would be. If a commercial site accidentally gets rid of some third-world traffic running behind a shoddy ISP, it's no loss; nobody wants to advertise to them anyhow. But for us, those are the people who gain the most from being able to reach us. Actually, at the moment this mostly affects crap sites that hot-load data from us to display spamvertisements on hacked sites on internet. I don't know where your 'shoddy ISP' speculation fits in. Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] User-Agent:
Hello, Am Dienstag 16 Februar 2010 04:15:57 schrieb William Pietri: some third-world traffic why should browser in the 3. world not send user-agents like our browsers (I doubt that they use others then we)? The change by domas just blocks 2 kinds of requests: 1.) By broken bots and crawlers and 2.) by paranoid users who removed the user-agents in their browsers. The 99% of normal users (with normal browser) will not notice a difference. Sincerly, DaB. -- wp-blog.de signature.asc Description: This is a digitally signed message part. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] User-Agent:
On 02/15/2010 07:25 PM, Domas Mituzas wrote: Was there some urgent production impact that required doing this with no notice? Actually we had User-Agent header requirement for ages, it just failed to do what it had to do for a while. Consider this to be a bugfix. Ok. I'm going to take that as no. In the future, I think it would be better to let people know in advance about non-urgent changes that may break things for them. Was any impact analysis done on this? Yup! Would you care to share the results with us? In the future, I'd suggest giving basic info like that as part of an announcement. Actually, at the moment this mostly affects crap sites that hot-load data from us to display spamvertisements on hacked sites on internet. That's another good thing to share as part of a change announcement: motivation for the change. I don't know where your 'shoddy ISP' speculation fits in. Last I looked, there were a lot of poorly maintained proxies out there, some of which mangle headers. It seemed reasonable to me that some of those are on low-rent ISPs in poor countries. If you have already done the work to prove that no legitimate users anywhere in the world are impacted by this change, then perhaps you could save us further discussion and just explain that? Thanks, William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] User-Agent:
On 02/15/2010 06:50 PM, Steve Summit wrote: You're trying to detect / guard against malicious behavior using *User-Agent*?? Good grief. Have fun with the whack-a-mole game, then. Yes, a simple restriction like this tends to create smarter villains rather than less villainy. Filtering on an obvious, easy-to-change characteristic also destroys a useful source of information on who the bad people are, making future abuse prevention efforts harder. William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] User-Agent:
William, Yes, a simple restriction like this tends to create smarter villains rather than less villainy. Filtering on an obvious, easy-to-change characteristic also destroys a useful source of information on who the bad people are, making future abuse prevention efforts harder. Thanks for insights. But no. We don't use UA as first step of analysis, it was helpful tertiary tool, that put these people into ignorant or malicious category. If they'd have spoofed their UAs, we'd block the IPs and inform upstreams, as fully malicious behavior. If they had nice UA, we might have attempted to contact them or have isolated their workload until the issue is fixed ;-) Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] User-Agent:
On 02/15/2010 07:55 PM, Domas Mituzas wrote: Yes, a simple restriction like this tends to create smarter villains rather than less villainy. Filtering on an obvious, easy-to-change characteristic also destroys a useful source of information on who the bad people are, making future abuse prevention efforts harder. Thanks for insights. But no. We don't use UA as first step of analysis, it was helpful tertiary tool, that put these people into ignorant or malicious category. If they'd have spoofed their UAs, we'd block the IPs and inform upstreams, as fully malicious behavior. If they had nice UA, we might have attempted to contact them or have isolated their workload until the issue is fixed ;-) I am saying that going forward you have eliminated WMF's ability to use a tertiary tool that you agree was helpful. Having spent a lot of time dealing with abuse early in the Web's history, I wouldn't have done it that way. But it's not really my problem and you don't appear to be looking for input, so godspeed. William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l