Re: [Wikitech-l] Extensions in SVN looking for a maintainer

2010-02-15 Thread Roan Kattouw
2010/2/15 Ævar Arnfjörð Bjarmason ava...@gmail.com:
 Domas has also complained that it eats up resources. Is this something
 that can conceivably be fixed in it or is it just inherent in anything
 that calls the parser from an extension tag and will thus need parser
 fixups to get anywhere?

IIRC Domas was complaining about {{cite}} *templates* and their
complexity. Their inefficiency was unrelated to the Cite extension
AFAIK.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Extensions in SVN looking for a maintainer

2010-02-15 Thread Siebrand Mazeland
Thanks, Avar.

Cite: no action taken in Bugzilla.
Newuserlog: has been removed. Can unfortunately not close components.
CrossNamespaceLinks: added Avar as maintainer
Desysop: has been removed. See above.
Espionage: had one closed issue; reassigned to CheckUser and component deleted.
Eval: added Avar as maintainer.
PageCSS: added Avar as maintainer.

Thanks!

Siebrand


-Original Message-
From: wikitech-l-boun...@lists.wikimedia.org 
[mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Ævar Arnfjörð 
Bjarmason
Sent: Monday, February 15, 2010 12:24 AM
To: Wikimedia developers
Cc: MediaWiki announcements and site admin list
Subject: Re: [Wikitech-l] Extensions in SVN looking for a maintainer

On Fri, Feb 12, 2010 at 19:00, Siebrand Mazeland s.mazel...@xs4all.nl wrote:
 List of extensions used by Wikimedia without a Bugzilla maintainer:

[..]


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] importing enwiki into local database

2010-02-15 Thread Carl (CBM)
On Sun, Feb 14, 2010 at 7:34 PM, Marco Schuster
ma...@harddisk.is-a-geek.org wrote:
 What about turning wgUseTidy off for some time?

The doctype that we serve is XHTML, and various AJAX tools rely on
being able to parse the DOM tree as an XML document.  But there are
certain valid wikitext constructions that are ''guaranteed'' to
generate invalid XML without tidy, because of mediawiki bugs. For
example, putting a list inside a table cell (bug 17486). So tidy seems
to be a requirement for the time being.

I hope that, before the doctype is changed to html5, a substantial
grace period is given for people to change to an HTML5 parser in their
javascript code.

One high-profile use case here is the Twinkle script.

- Carl

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] importing enwiki into local database

2010-02-15 Thread Aryeh Gregor
On Mon, Feb 15, 2010 at 2:19 PM, Carl (CBM) cbm.wikipe...@gmail.com wrote:
 I hope that, before the doctype is changed to html5, a substantial
 grace period is given for people to change to an HTML5 parser in their
 javascript code.

We will continue with well-formed XML output for the foreseeable
future for exactly this reason.  We don't need to stop emitting
well-formed XML to use HTML5.  I have tested trunk defaults ($wgHtml5
= true; $wgWellFormedXml = true;) with the Python SAX parser and it
parses pages correctly, so no tools should be broken.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] User-Agent:

2010-02-15 Thread Domas Mituzas
Hi!

from now on specific per-bot/per-software/per-client User-Agent header is 
mandatory for contacting Wikimedia sites.

Domas
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] User-Agent:

2010-02-15 Thread Chad
On Mon, Feb 15, 2010 at 8:54 PM, Domas Mituzas midom.li...@gmail.com wrote:
 Hi!

 from now on specific per-bot/per-software/per-client User-Agent header is 
 mandatory for contacting Wikimedia sites.

 Domas
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


In that case should we tweak the MediaWiki user agent to serve
something more unique than MediaWiki/version?

-Chad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] More dump problems?

2010-02-15 Thread Mike.lifeguard
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi guys,

Just wanted to be sure this got checked out. From #mediawiki:

Manny hi. out of curiosity, are there any known issues with the recent
batch of dumps? I noticed that some supposedly completed dumps seem to
have ended with Please provide a User-Agent header

Domas banned all UA-less requests earlier tonight, so it didn't seem
random or nonsensical.

Thanks,
- -Mike
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkt5/jgACgkQst0AR/DaKHtmbwCfZtbjmwNbvOXF/u4/x5j95o7V
cZEAn0LnBuhST3BpaSKSsy6D4jaas6NH
=yvuR
-END PGP SIGNATURE-

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] User-Agent:

2010-02-15 Thread DaB.
Hello,
Am Dienstag 16 Februar 2010 03:06:49 schrieb Steve Summit:
 Is it permissible to send
 
 User-Agent: x

why is it so hard to set
User-Agent: mytoolname/version mym...@mail.invalid

? (you can forgo the mail if you paranoid)

It's clean, fast and good.

Sincerly,
DaB.

-- 
wp-blog.de


signature.asc
Description: This is a digitally signed message part.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] User-Agent:

2010-02-15 Thread Domas Mituzas
Hi Steve,

 But why?

Because we need to identify malicious behavior. 

 (This just broke one of my bots.)
 Are the details of this policy discussed anywhere?

I don't know. Probably. We always told people to specify User-Agent, just the 
check was broken. 

 Is it permissible to send
 
   User-Agent: x
 
 thus providing precisely the same amount of information as if not
 supplying the header at all?

No, you clearly miss very simple idea that with such user-agent you clearly 
identify yourself as malicious, whereas when you don't specify, you're either 
malicious or ignorant. 

Do note, we're good at detecting spoofed user-agents too, so if your bots 
disguise as MSIE or Firefox or any other regular browser, your behavior is seen 
as malicious. 

We do not like malicious behavior. 

Domas
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] User-Agent:

2010-02-15 Thread Steve Summit
Domas wrote:
 from now on specific per-bot/per-software/per-client User-Agent
 header is mandatory for contacting Wikimedia sites.

Oh, my.  And not just to be a bot, or to edit the site manually,
but even to view it.  You can't even fetch a single, simple page
now without supplying that header.

If this has been discussed to death elsewhere and represents
some bizarrely-informed consensus, I'll try to spare this list
my belated rantings, but this is a terrible, terrible idea.
Relying on User-Agent represents the very antithesis of
[[Postel's Law]], a rock-solid principle o which the Internet
(used to be) based.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] User-Agent:

2010-02-15 Thread Domas Mituzas
Steve,

 If this has been discussed to death elsewhere and represents
 some bizarrely-informed consensus, I'll try to spare this list
 my belated rantings, but this is a terrible, terrible idea.
 Relying on User-Agent represents the very antithesis of
 [[Postel's Law]], a rock-solid principle o which the Internet
 (used to be) based.

RFC2616:
14.43 User-Agent

The User-Agent request-header field contains information about the user agent 
originating the request. This is for statistical purposes, the tracing of 
protocol violations, and automated recognition of user agents for the sake of 
tailoring responses to avoid particular user agent limitations. User agents 
SHOULD include this field with requests. The field can contain multiple product 
tokens (section 3.8) and comments identifying the agent and any subproducts 
which form a significant part of the user agent. By convention, the product 
tokens are listed in order of their significance for identifying the 
application.

   User-Agent = User-Agent : 1*( product | comment )

Example:

   User-Agent: CERN-LineMode/2.15 libwww/2.17b3

RFC2119:
3. SHOULD   This word, or the adjective RECOMMENDED, mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.

I guess you just found one more implication to carefully weight before not 
specifying U-A.

Domas


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] User-Agent:

2010-02-15 Thread Steve Summit
Domas wrote:
 Hi Steve,
  But why?

 Because we need to identify malicious behavior. 

You're trying to detect / guard against malicious behavior using
*User-Agent*??  Good grief.  Have fun with the whack-a-mole game, then.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] User-Agent:

2010-02-15 Thread Domas Mituzas
Hi!

 You're trying to detect / guard against malicious behavior using
 *User-Agent*??  Good grief.  Have fun with the whack-a-mole game, then.


Thanks! I'm relatively new to this all operations game, so I'm obsessed about 
graphs and whack-a-mole :)

Cheers,
Domas
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] User-Agent:

2010-02-15 Thread William Pietri
On 02/15/2010 05:54 PM, Domas Mituzas wrote:
 Hi!

 from now on specific per-bot/per-software/per-client User-Agent header is 
 mandatory for contacting Wikimedia sites.


Two questions:

Was there some urgent production impact that required doing this with no 
notice?

Was any impact analysis done on this? Given Wikipedia's mission, we 
can't be as casual about rejecting traffic as a commercial site would 
be. If a commercial site accidentally gets rid of some third-world 
traffic running behind a shoddy ISP, it's no loss; nobody wants to 
advertise to them anyhow. But for us, those are the people who gain the 
most from being able to reach us.

William

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] User-Agent:

2010-02-15 Thread Domas Mituzas
Hi!

 Was there some urgent production impact that required doing this with no 
 notice?

Actually we had User-Agent header requirement for ages, it just failed to do 
what it had to do for a while. Consider this to be a bugfix. 

 Was any impact analysis done on this?

Yup! 

 Given Wikipedia's mission, we can't be as casual about rejecting traffic as a 
 commercial site would be.
 If a commercial site accidentally gets rid of some third-world traffic 
 running behind a shoddy ISP, it's no
 loss; nobody wants to advertise to them anyhow. But for us, those are the 
 people who gain the 
 most from being able to reach us.

Actually, at the moment this mostly affects crap sites that hot-load data from 
us to display spamvertisements on hacked sites on internet.
I don't know where your 'shoddy ISP' speculation fits in.

Domas
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] User-Agent:

2010-02-15 Thread DaB.
Hello,
Am Dienstag 16 Februar 2010 04:15:57 schrieb William Pietri:
 some third-world traffic

why should browser in the 3. world not send user-agents like our browsers (I 
doubt that they use others then we)? The change by domas just blocks 2 kinds 
of requests: 1.) By broken bots and crawlers and 2.) by paranoid users who 
removed the user-agents in their browsers. The 99% of normal users (with 
normal browser) will not notice a difference.

Sincerly,
DaB.

-- 
wp-blog.de


signature.asc
Description: This is a digitally signed message part.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] User-Agent:

2010-02-15 Thread William Pietri
On 02/15/2010 07:25 PM, Domas Mituzas wrote:
 Was there some urgent production impact that required doing this with no
 notice?
  
 Actually we had User-Agent header requirement for ages, it just failed to do 
 what it had to do for a while. Consider this to be a bugfix.


Ok. I'm going to take that as no. In the future, I think it would be 
better to let people know in advance about non-urgent changes that may 
break things for them.


 Was any impact analysis done on this?
  
 Yup!


Would you care to share the results with us?

In the future, I'd suggest giving basic info like that as part of an 
announcement.

 Actually, at the moment this mostly affects crap sites that hot-load data 
 from us to display spamvertisements on hacked sites on internet.


That's another good thing to share as part of a change announcement: 
motivation for the change.

 I don't know where your 'shoddy ISP' speculation fits in.


Last I looked, there were a lot of poorly maintained proxies out there, 
some of which mangle headers. It seemed reasonable to me that some of 
those are on low-rent ISPs in poor countries. If you have already done 
the work to prove that no legitimate users anywhere in the world are 
impacted by this change, then perhaps you could save us further 
discussion and just explain that?

Thanks,

William



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] User-Agent:

2010-02-15 Thread William Pietri
On 02/15/2010 06:50 PM, Steve Summit wrote:
 You're trying to detect / guard against malicious behavior using
 *User-Agent*??  Good grief.  Have fun with the whack-a-mole game, then.



Yes, a simple restriction like this tends to create smarter villains 
rather than less villainy. Filtering on an obvious, easy-to-change 
characteristic also destroys a useful source of information on who the 
bad people are, making future abuse prevention efforts harder.

William

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] User-Agent:

2010-02-15 Thread Domas Mituzas
William,

 Yes, a simple restriction like this tends to create smarter villains 
 rather than less villainy. Filtering on an obvious, easy-to-change 
 characteristic also destroys a useful source of information on who the 
 bad people are, making future abuse prevention efforts harder.


Thanks for insights.  But no.

We don't use UA as first step of analysis, it was helpful tertiary tool, that 
put these people into ignorant or malicious category. 
If they'd have spoofed their UAs, we'd block the IPs and inform upstreams, as 
fully malicious behavior. 
If they had nice UA, we might have attempted to contact them or have isolated 
their workload until the issue is fixed ;-)

Domas
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] User-Agent:

2010-02-15 Thread William Pietri
On 02/15/2010 07:55 PM, Domas Mituzas wrote:
 Yes, a simple restriction like this tends to create smarter villains
 rather than less villainy. Filtering on an obvious, easy-to-change
 characteristic also destroys a useful source of information on who the
 bad people are, making future abuse prevention efforts harder.
  

 Thanks for insights.  But no.

 We don't use UA as first step of analysis, it was helpful tertiary tool, that 
 put these people into ignorant or malicious category.
 If they'd have spoofed their UAs, we'd block the IPs and inform upstreams, as 
 fully malicious behavior.
 If they had nice UA, we might have attempted to contact them or have isolated 
 their workload until the issue is fixed ;-)


I am saying that going forward you have eliminated WMF's ability to use 
a tertiary tool that you agree was helpful.

Having spent a lot of time dealing with abuse early in the Web's 
history, I wouldn't have done it that way. But it's not really my 
problem and you don't appear to be looking for input, so godspeed.

William

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l