Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-27 Thread Gryllida
On Tue, 17 Sep 2013, at 7:51, Tyler Romeo wrote:
 On Mon, Sep 16, 2013 at 6:12 PM, Gabriel Wicke gwi...@wikimedia.org wrote:
 
  * use simple action urls
https://en.wikipedia.org/Foo?action=history instead of
https://en.wikipedia.org/w/index.php?title=Fooaction=history
 
 
 This already works.
 

I would be concerned about proper work of this feature in wikilinks. [[Main 
Page?action=history|Foo]] makes a red broken link.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-27 Thread Matthew Flaschen

On 09/27/2013 06:03 AM, Gryllida wrote:

I would be concerned about proper work of this feature in wikilinks. [[Main 
Page?action=history|Foo]] makes a red broken link.


So does:

[[/w/index.php?title=Main Page|Foo]]

Neither would be expected to work.  Anything to the left of the pipe in 
your example is considered a page title.  I don't think anything about 
wikilink parsing (or any parsing) is proposed to change.


Matt Flaschen



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-19 Thread Matthew Flaschen

On 09/17/2013 05:59 AM, Daniel Friesen wrote:

Side topic https://en.wiktionary.org/w/r/t is messed up:  To check for
r/t on Wikipedia, see: //en.wikipedia.org/wiki/r/t
https://en.wikipedia.org/wiki/r/t


Good catch, filed: https://bugzilla.wikimedia.org/show_bug.cgi?id=54357

Matt Flaschen

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-19 Thread Tim Starling
On 20/09/13 03:04, Jon Robson wrote:
 Thanks Tim for running those data. That seems to suggest the URL
 structure works for the most case.

I think the request rate for actual articles in the root is very, very
low. And if you look at the paste I gave earlier:

http://paste.tstarling.com/p/uhtFqg.html

there's reason to think that the amount of traffic that comes from
naive readers typing URLs and expecting an article is much smaller
than even 149k per week. A naive user would be more likely to type a
URL starting with a lower-case letter, and if you take those entries,
and filter out the obvious client bugs and typos, that leaves only 39
log entries. If we filter out some more log entries that are unlikely
search terms for Wikipedia articles (enregistrement-audio-musique,
is, unlimited_data_plan, etc.), that leaves maybe 30.
http://paste.tstarling.com/p/KWuHif.html

Of these, only 12 actually correspond to Wikipedia articles or redirects:

abolition
addicting_games
apple_inc
carnaval
dreamshade
facade
girls
insidious
karthik
online_coupons
snam
walkabout

So the number of naive readers actually helped by our 404 Refresh to
/wiki/ is probably closer to 12k per week than 149k per week.

Personally, I think the refresh is annoying, since it makes it much
more difficult to correct typos in manually-typed URLs. If you
actually meant to type some non-article URL like a CSS resource, and
make a typo which causes it to hit the refresh, the URL you typed is
erased from your browser's address bar and history, making correction
of the typo much more difficult. Maybe we should just include a link
to the search page, rather than redirect or refresh.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-19 Thread MZMcBride
Tim Starling wrote:
Personally, I think the refresh is annoying, since it makes it much
more difficult to correct typos in manually-typed URLs. If you
actually meant to type some non-article URL like a CSS resource, and
make a typo which causes it to hit the refresh, the URL you typed is
erased from your browser's address bar and history, making correction
of the typo much more difficult. Maybe we should just include a link
to the search page, rather than redirect or refresh.

Mark Ryan redesigned the 404 page in 2009 and specifically removed the
meta refresh tag (cf. https://bugs.wikimedia.org/17316#c0).

The redesigned page eventually got deployed, but the client-side refresh
very sneakily moved from the HTML output to a Refresh header (cf.
https://bugs.wikimedia.org/35052#c0).

Neither bug is resolved, if anyone is interested in helping out. :-)

MZMcBride



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-19 Thread Jon Robson
On 19 Sep 2013 18:23, Tim Starling tstarl...@wikimedia.org wrote:

 On 20/09/13 03:04, Jon Robson wrote:
  Thanks Tim for running those data. That seems to suggest the URL
  structure works for the most case.

 I think the request rate for actual articles in the root is very, very
 low.

I agree.. Sorry I guess my message wasn't so clear. I meant existing URL
structure :)

And if you look at the paste I gave earlier:

 http://paste.tstarling.com/p/uhtFqg.html

 there's reason to think that the amount of traffic that comes from
 naive readers typing URLs and expecting an article is much smaller
 than even 149k per week. A naive user would be more likely to type a
 URL starting with a lower-case letter, and if you take those entries,
 and filter out the obvious client bugs and typos, that leaves only 39
 log entries. If we filter out some more log entries that are unlikely
 search terms for Wikipedia articles (enregistrement-audio-musique,
 is, unlimited_data_plan, etc.), that leaves maybe 30.
 http://paste.tstarling.com/p/KWuHif.html

 Of these, only 12 actually correspond to Wikipedia articles or redirects:

 abolition
 addicting_games
 apple_inc
 carnaval
 dreamshade
 facade
 girls
 insidious
 karthik
 online_coupons
 snam
 walkabout

 So the number of naive readers actually helped by our 404 Refresh to
 /wiki/ is probably closer to 12k per week than 149k per week.

 Personally, I think the refresh is annoying, since it makes it much
 more difficult to correct typos in manually-typed URLs. If you
 actually meant to type some non-article URL like a CSS resource, and
 make a typo which causes it to hit the refresh, the URL you typed is
 erased from your browser's address bar and history, making correction
 of the typo much more difficult. Maybe we should just include a link
 to the search page, rather than redirect or refresh.

 -- Tim Starling


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-19 Thread Jon Robson
Thanks Tim for running those data. That seems to suggest the URL
structure works for the most case.

On Wed, Sep 18, 2013 at 12:07 AM, Tim Starling tstarl...@wikimedia.org wrote:
 On 17/09/13 13:59, Jon Robson wrote:
 I would suggest taking a look at the number of 404s caused by people trying
 to access pages without the wiki prefix This would be interesting data
 to go alongside this interesting proposal...

 There are lots of different sorts of 404s, so it's necessary to do
 some filtering. For example:

 * double-slashes, due to bug 52253
 * sitemap.xml
 * Apple touch icons
 * bullet.gif in various directories
 * vulnerability scanning, e.g. xmlrpc.php
 * BlueCoat verify/notify, as described in
 http://www.webmasterworld.com/search_engine_spiders/3859463.htm
 * Serial numbers like http://en.wikipedia.org/B008NAYASM .

 I filtered out everything with a dot or slash in the prospective
 article title, as well as the BlueCoat URLs and the UAs responsible
 for serial number URLs. To simplify analysis, I took log lines from
 the English Wikipedia only.

 Most of the remaining log entries were search engine crawlers, so I
 took those out too.

 The result was 149 log entries at a 1/1000 sample rate, for the week
 of September 8-14, implying a request rate of about 639,000 per month.
 This is about 0.006% of the English Wikipedia's page view rate.

 The 149 URLs are at http://paste.tstarling.com/p/uhtFqg.html

 -- Tim Starling


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jon Robson
http://jonrobson.me.uk
@rakugojon

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-18 Thread Tim Starling
On 17/09/13 13:59, Jon Robson wrote:
 I would suggest taking a look at the number of 404s caused by people trying
 to access pages without the wiki prefix This would be interesting data
 to go alongside this interesting proposal...

There are lots of different sorts of 404s, so it's necessary to do
some filtering. For example:

* double-slashes, due to bug 52253
* sitemap.xml
* Apple touch icons
* bullet.gif in various directories
* vulnerability scanning, e.g. xmlrpc.php
* BlueCoat verify/notify, as described in
http://www.webmasterworld.com/search_engine_spiders/3859463.htm
* Serial numbers like http://en.wikipedia.org/B008NAYASM .

I filtered out everything with a dot or slash in the prospective
article title, as well as the BlueCoat URLs and the UAs responsible
for serial number URLs. To simplify analysis, I took log lines from
the English Wikipedia only.

Most of the remaining log entries were search engine crawlers, so I
took those out too.

The result was 149 log entries at a 1/1000 sample rate, for the week
of September 8-14, implying a request rate of about 639,000 per month.
This is about 0.006% of the English Wikipedia's page view rate.

The 149 URLs are at http://paste.tstarling.com/p/uhtFqg.html

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-18 Thread C. Scott Ananian
Note also that zhwiki (and others?) profitably uses the first part of the
path to do variant selection.

https://zh.wikipedia.org/wiki/User:Cscott uses the wiki default variant (if
logged in, uses the variant from the user's preferences)
https://zh.wikipedia.org/zh-hans/User:Cscott
https://zh.wikipedia.org/zh-hk/User:Cscott
etc use the specified variant.

I have a dream to eventually enable
https://en.wikipedia.org/en-gb/Football
in a similar fashion.
  --scott
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-17 Thread Tim Starling
On 17/09/13 14:01, Gabriel Wicke wrote:
 On 09/16/2013 07:48 PM, Tim Starling wrote:
 On 17/09/13 11:08, Gabriel Wicke wrote:
 Tim mentions in
 https://www.mediawiki.org/wiki/Special:Code/MediaWiki/49833#c3561 that
 this only applied to IE3 and earlier, and IE4 respects the Content-type
 header. As the market share of IE = 3 is probably non-existent we could
 probably blacklist it from logging in and content API access altogether.

 This issue affects IE at least up to IE 6, possibly later, see bug 28235.
 
 Thanks for the pointer! It is sad that IE6 (and likely IE7) is still
 haunting us. IE8+ is covered by the X-Content-Type-Options header.
 
 It sounds like your Content-Disposition solution [1] should still work
 for IE6/7 where that header is not used otherwise. The existing users of
 that header all seem to be file-related. Did I miss any use in action
 handlers?

I'm assuming you can grep for Content-Disposition as well as I can.
IIRC, the difficulty with Content-Disposition, in the context of a
security patch, was the need to abstract handling of the header out of
the various places that send it, so that it would be consistent and
demonstrably secure. That would have made the security patch larger
and more complex than it needed to be, which would have been a problem
for backporters. That shouldn't be a concern for your feature.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-17 Thread K. Peachey
On Tue, Sep 17, 2013 at 8:34 AM, Gabriel Wicke gwi...@wikimedia.org wrote:

 There *might* be, in theory. In practice I doubt that there are any
 articles starting with 'w/'. To avoid future conflicts, we should
 probably prefix private paths with an underscore as titles cannot start
 with it (and REST APIs often use it for special resources).


I bet people have said that about single letter interwikis, but we do have
quiet a few single letter: page titles around. have single letter/
is not un-believable.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-17 Thread Nikola Smolenski

On 17/09/13 10:24, K. Peachey wrote:

On Tue, Sep 17, 2013 at 8:34 AM, Gabriel Wicke gwi...@wikimedia.org wrote:


There *might* be, in theory. In practice I doubt that there are any
articles starting with 'w/'. To avoid future conflicts, we should
probably prefix private paths with an underscore as titles cannot start
with it (and REST APIs often use it for special resources).



I bet people have said that about single letter interwikis, but we do have
quiet a few single letter: page titles around. have single letter/
is not un-believable.


I have found 2476 pages in English Wikipedia that start with 
'[something]/', inlcuding pages starting with '//'. None of them start 
with a small letter though, for obvious reasons.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-17 Thread Daniel Kinzler
Am 17.09.2013 00:34, schrieb Gabriel Wicke:
 There *might* be, in theory. In practice I doubt that there are any
 articles starting with 'w/'. 

I count 10 on en.wiktionary.org:

https://en.wiktionary.org/w/index.php?title=Special%3APrefixIndexprefix=w%2Fnamespace=0

 To avoid future conflicts, we should
 probably prefix private paths with an underscore as titles cannot start
 with it (and REST APIs often use it for special resources).

That would be better.

But still, I think this is a bad idea. Essentially, putting Articles at the root
of the domain mains hogging the domain as a namespace. Depending on what you
want to do with your wiki, this is not a good idea.

For insteancve, wikidata uses the /entity/ path for URIs representing things,
while the documents under /wiki/ are descriptions of these things. If page
content was located at the root, we'd have nasty namespace pollution.

Basically: page content is only one of the things a wiki may server. Internal
resources like CSS are another. But there may be much more, like structured
data. It's good to use prefixes to keep these apart.

-- daniel


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-17 Thread Daniel Friesen
On 2013-09-17 2:29 AM, Nikola Smolenski wrote:
 On 17/09/13 10:24, K. Peachey wrote:
 On Tue, Sep 17, 2013 at 8:34 AM, Gabriel Wicke gwi...@wikimedia.org
 wrote:

 There *might* be, in theory. In practice I doubt that there are any
 articles starting with 'w/'. To avoid future conflicts, we should
 probably prefix private paths with an underscore as titles cannot start
 with it (and REST APIs often use it for special resources).


 I bet people have said that about single letter interwikis, but we do
 have
 quiet a few single letter: page titles around. have single
 letter/
 is not un-believable.

 I have found 2476 pages in English Wikipedia that start with
 '[something]/', inlcuding pages starting with '//'. None of them start
 with a small letter though, for obvious reasons.
The problem with that query is you're searching Wikipedia. Try
Wiktionary instead. I found 5 just on the first letter I tested
https://en.wiktionary.org/wiki/Special:PrefixIndex/a/

Also pages prefixed with single letter/ aren't the only thing that
creates conflicts. As far as standard rewrite rules and webservers are
considered a directory at /a/ and /a are the same thing. See how
https://en.wikipedia.org/w is not a 404 pointing to [[w]] like
https://en.wikipedia.org/a is but instead is the same as w/ and hence
w/index.php. So really any single letter article on a root pathed wiki
conflicts with any single letter root directory. ;) And Wikipedia has a
redirect like that for every single letter of the latin alphabet.
(Actually forget the latin alphabet, they've practically got most of
Unicode there)

Side topic https://en.wiktionary.org/w/r/t is messed up:  To check for
r/t on Wikipedia, see: //en.wikipedia.org/wiki/r/t
https://en.wikipedia.org/wiki/r/t

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-17 Thread Daniel Friesen
On 2013-09-16 8:01 PM, Gabriel Wicke wrote:
 On 09/16/2013 07:24 PM, Daniel Friesen wrote:
 On 2013-09-16 7:09 PM, Gabriel Wicke wrote:
 Any of the entry points? Any new entry point? Anything we ever want to
 put into the root?
 We should be able to avoid most conflicts by picking prefixed entry
 points. However, as we can't drop the clashing /w/api.php any time soon
 I have removed the /wiki/ part from the RFC:

 https://www.mediawiki.org/wiki/Requests_for_comment/Clean_up_URLs

 So now only the conversion from

 /w/index.php?title=foo?action=history
 to
 /foo?action=history

 is under discussion.

 Gabriel
 Has the practice of disallowing /w/ or /index.php inside robots.txt to
 force search engines to completely ignore search, edit pages,
 exponential pagination, etc.. been considered?
 See
 https://www.mediawiki.org/wiki/Requests_for_comment/Clean_up_URLs#Migration
Ok. Though even assuming the * and Allow: non-standard features are
supported by all bots we want to target I actually don't like the idea
of blacklisting /wiki/*? in this way.

I don't think that every url with a query in it qualifies as something
we want to blacklist from search engines. There are plenty but sometimes
there is content that's served with a query which could otherwise be a
good idea to index.

For example the non-first pages on long categories and Special:Allpages'
pagination. The latter has robots=noindex – though I think we may want
to reconsider that – but the former is not noindexed and with the
introduction of rel=next, etc... would be pretty reasonable to index
but is currently blacklisted by robots.txt.
Additionally while we normally want to noindex edit pages. This isn't
true of redlinks in every case. Take redlinked category links for
example. These link to an action=editredlink=1 which for a search
engine would then redirect back to the pretty url for the category. But
because of robots.txt this link is masked because the intermediate
redirect cannot be read by the search engine.

The idea I had to fix that naturally was to make MediaWiki aware of this
and whether by a new routing system or simply filters for specific
simple queries make it output /wiki/title?query urls for those cases
where it's a query we would want indexed and leave robots blacklisted
stuff under /w/ (though I did also consider a separate short url path
like /w/page/$1 to make internal/robots blacklisted urls pretty).
However adding Disallow: /wiki/*? to robots.txt will preclude the
ability to do that.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-17 Thread Nikola Smolenski

On 17/09/13 11:59, Daniel Friesen wrote:

On 2013-09-17 2:29 AM, Nikola Smolenski wrote:

I have found 2476 pages in English Wikipedia that start with
'[something]/', inlcuding pages starting with '//'. None of them start
with a small letter though, for obvious reasons.

The problem with that query is you're searching Wikipedia. Try
Wiktionary instead. I found 5 just on the first letter I tested
https://en.wiktionary.org/wiki/Special:PrefixIndex/a/


There are 124 of which 63 start with a small letter.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-17 Thread Daniel Friesen
On 2013-09-17 2:48 AM, Daniel Kinzler wrote:
 To avoid future conflicts, we should
 probably prefix private paths with an underscore as titles cannot start
 with it (and REST APIs often use it for special resources).
 That would be better.

 But still, I think this is a bad idea. Essentially, putting Articles at the 
 root
 of the domain mains hogging the domain as a namespace. Depending on what you
 want to do with your wiki, this is not a good idea.

 For insteancve, wikidata uses the /entity/ path for URIs representing things,
 while the documents under /wiki/ are descriptions of these things. If page
 content was located at the root, we'd have nasty namespace pollution.

 Basically: page content is only one of the things a wiki may server. 
 Internal
 resources like CSS are another. But there may be much more, like structured
 data. It's good to use prefixes to keep these apart.

 -- daniel
+1

We've got others for content-related things too besides ones for
internal resources and structured data.

eg: https://test2.wikipedia.org/s/85

((And I'll try to resist starting a rant about the knockoff REST which
is a partial premise here))

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-17 Thread Gabriel Wicke
On 09/17/2013 02:48 AM, Daniel Kinzler wrote:
 Am 17.09.2013 00:34, schrieb Gabriel Wicke:
 There *might* be, in theory. In practice I doubt that there are any
 articles starting with 'w/'. 
 
 I count 10 on en.wiktionary.org:
 
 https://en.wiktionary.org/w/index.php?title=Special%3APrefixIndexprefix=w%2Fnamespace=0

The good news is that none of them is /w/{index,api,load}.php ;)

 To avoid future conflicts, we should
 probably prefix private paths with an underscore as titles cannot start
 with it (and REST APIs often use it for special resources).
 
 That would be better.
 
 But still, I think this is a bad idea. Essentially, putting Articles at the 
 root
 of the domain mains hogging the domain as a namespace. Depending on what you
 want to do with your wiki, this is not a good idea.

I agree that it does not make sense to place the wiki at the root level
if you are running (or plan to run) other services on the domain. On
Wikipedia, the wiki is the primary use case. Optimizing for the common
use case can be a good idea.

 Basically: page content is only one of the things a wiki may server. 
 Internal
 resources like CSS are another. But there may be much more, like structured
 data. It's good to use prefixes to keep these apart.

For different representations of the same resource there is also much to
be said for suffixes, even if some of those representations are not
visual. Additionally, we have namespaces as a prefix mechanism within a
wiki. There will sure be cases where leaving the wiki makes sense, but I
am hesitant to discard the flat wiki namespace all too quickly.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-17 Thread Brad Jorsch (Anomie)
On Mon, Sep 16, 2013 at 7:41 PM, Gabriel Wicke gwi...@wikimedia.org wrote:
 Using sub-resources rather than the random switch to /w/index.php is
 more important for caching (promotes deterministic URLs) and does not
 seem to involve similar trade-offs.

Note that promotes deterministic URLs applies only to cases where
only one parameter other than 'title' is provided to index.php
(usually this parameter is 'action'). If the URL has more than one
parameter other than 'title', you're still out of luck.

But you can turn on $wgActionPaths to remove 'action' from the query
string too! you say? But then you're still stuck if the URL has two
parameters other than 'action' and 'title'. Such as offset and
limit, for example.


-- 
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-17 Thread Gabriel Wicke
On 09/17/2013 08:40 AM, Brad Jorsch (Anomie) wrote:
 On Mon, Sep 16, 2013 at 7:41 PM, Gabriel Wicke gwi...@wikimedia.org wrote:
 Using sub-resources rather than the random switch to /w/index.php is
 more important for caching (promotes deterministic URLs) and does not
 seem to involve similar trade-offs.
 
 Note that promotes deterministic URLs applies only to cases where
 only one parameter other than 'title' is provided to index.php
 (usually this parameter is 'action'). If the URL has more than one
 parameter other than 'title', you're still out of luck.

An end point that wants to be cacheable should only use one query
parameter, which might well be a path. Hypothetical examples:

http://wiki.org/wiki/Foo?r=latest/html
http://wiki.org/wiki/Foo?r=123456/wikitext

An alternative solution would be to specify a list of required query
parameters and a canonical ordering, and to reject (or redirect)
requests not conforming to this spec. The problem I see with this
approach is that many client libraries don't provide control over the
order of query parameters, which would make such an interface hard to use.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-17 Thread Brad Jorsch (Anomie)
On Tue, Sep 17, 2013 at 12:27 PM, Gabriel Wicke gwi...@wikimedia.org wrote:

 An end point that wants to be cacheable should only use one query
 parameter, which might well be a path. Hypothetical examples:

 http://wiki.org/wiki/Foo?r=latest/html
 http://wiki.org/wiki/Foo?r=123456/wikitext

So now you're cramming multiple parameters, ordered, into one
parameter? Why not go all the way and do
http://wiki.org/wiki/123456/wikitext/Foo then?

But IMO, that's ridiculous.

 An alternative solution would be to specify a list of required query
 parameters and a canonical ordering, and to reject (or redirect)
 requests not conforming to this spec.

reject is even more ridiculous. redirect is less ridiculous, but
is strange and will increase latency and number-of-requests for
clients that don't know the magic order.

What is the actual benefit we're trying to get here? All I've gotten
so far along those lines is improve cacheability, but it doesn't
seem to have been established whether caching even needs improving in
this area.


-- 
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-17 Thread Gabriel Wicke
On 09/17/2013 11:24 AM, Brad Jorsch (Anomie) wrote:
 On Tue, Sep 17, 2013 at 12:27 PM, Gabriel Wicke gwi...@wikimedia.org wrote:
 An end point that wants to be cacheable should only use one query
 parameter, which might well be a path. Hypothetical examples:

 http://wiki.org/wiki/Foo?r=latest/html
 http://wiki.org/wiki/Foo?r=123456/wikitext
 
 So now you're cramming multiple parameters, ordered, into one
 parameter? Why not go all the way and do
 http://wiki.org/wiki/123456/wikitext/Foo then?

I consider the article to be the main resource we are interested in,
with a revision and then a specific part (format) of that revision as a
sub-resource. As our titles can contain slashes we need to delimit the
main resource from the sub-resource part. A single query parameter that
specifies the sub-resource path achieves that.

 What is the actual benefit we're trying to get here? All I've gotten
 so far along those lines is improve cacheability, but it doesn't
 seem to have been established whether caching even needs improving in
 this area.

A heavily-used content API will perform better and use less resources
when it is cacheable. This will become more important over time, so I
believe it is worth spending a small amount of effort on now.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-17 Thread MZMcBride
Gabriel Wicke wrote:
A heavily-used content API will perform better and use less resources
when it is cacheable. This will become more important over time, so I
believe it is worth spending a small amount of effort on now.

Sure, I think everyone agrees that a heavily used Web resource will
perform better with caching. I'm just not sure futzing around with path
names is the best way to try to ensure sustainable cacheability.

Is there a breakdown of what in a typical MediaWiki API request takes the
most time or uses the most resources (i.e., profiling a local request)? I
imagine there are multiple caching opportunities at other layers that
don't rely on path name, but it's difficult to say where you might see the
most gains without further data.

MZMcBride



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Tyler Romeo
On Mon, Sep 16, 2013 at 6:12 PM, Gabriel Wicke gwi...@wikimedia.org wrote:

 * drop the /wiki/ prefix
   https://en.wikipedia.org/Foo instead of
   https://en.wikipedia.org/wiki/Foo


Where would we put the API entry point? It can't be at
https://en.wikipedia.org/w/api.php because there might be an article named
w/api.php.


 * use simple action urls
   https://en.wikipedia.org/Foo?action=history instead of
   https://en.wikipedia.org/w/index.php?title=Fooaction=history


This already works.

*-- *
*Tyler Romeo*
Stevens Institute of Technology, Class of 2016
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Ryan Lane
On Mon, Sep 16, 2013 at 3:12 PM, Gabriel Wicke gwi...@wikimedia.org wrote:

 Hi,

 while tinkering with a RESTful content API I was reminded of an old pet
 peeve of mine: The URLs we use in Wikimedia projects are relatively long
 and ugly. I believe that we now have the ability to clean this up if we
 want to.

 It would be nice to

 * drop the /wiki/ prefix
   https://en.wikipedia.org/Foo instead of
   https://en.wikipedia.org/wiki/Foo

 * use simple action urls
   https://en.wikipedia.org/Foo?action=history instead of
   https://en.wikipedia.org/w/index.php?title=Fooaction=history

 The details of this proposal are discussed in the following RFC:

 https://www.mediawiki.org/wiki/Requests_for_comment/Clean_up_URLs

 I'm looking forward to your input!



https://www.mediawiki.org/wiki/Manual:Short_URL#URL_like_-_example.com.2FPage_title


*Warning:* this method may create an unstable URL structure and leave some
page names unusable on your wiki. See Manual:Wiki in site root
directoryhttps://www.mediawiki.org/wiki/Manual:Wiki_in_site_root_directory.
Please see the article Cool URIs don't
changehttp://www.w3.org/Provider/Style/URIand take a few minutes to
devise a stable URL structure for your web site
before hopping willy-nilly into rewrites into the URL root.

- Ryan
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Chad
On Mon, Sep 16, 2013 at 3:12 PM, Gabriel Wicke gwi...@wikimedia.org wrote:

 Hi,

 while tinkering with a RESTful content API I was reminded of an old pet
 peeve of mine: The URLs we use in Wikimedia projects are relatively long
 and ugly. I believe that we now have the ability to clean this up if we
 want to.

 It would be nice to

 * drop the /wiki/ prefix
   https://en.wikipedia.org/Foo instead of
   https://en.wikipedia.org/wiki/Foo

 * use simple action urls
   https://en.wikipedia.org/Foo?action=history instead of
   https://en.wikipedia.org/w/index.php?title=Fooaction=history

 The details of this proposal are discussed in the following RFC:

 https://www.mediawiki.org/wiki/Requests_for_comment/Clean_up_URLs

 I'm looking forward to your input!


Even better would be getting rid of action urls entirely.

-Chad
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Gabriel Wicke
On 09/16/2013 03:21 PM, Tyler Romeo wrote:
 On Mon, Sep 16, 2013 at 6:12 PM, Gabriel Wicke gwi...@wikimedia.org wrote:
 
 * drop the /wiki/ prefix
   https://en.wikipedia.org/Foo instead of
   https://en.wikipedia.org/wiki/Foo

 
 Where would we put the API entry point? It can't be at
 https://en.wikipedia.org/w/api.php because there might be an article named
 w/api.php.

There *might* be, in theory. In practice I doubt that there are any
articles starting with 'w/'. To avoid future conflicts, we should
probably prefix private paths with an underscore as titles cannot start
with it (and REST APIs often use it for special resources).

 * use simple action urls
   https://en.wikipedia.org/Foo?action=history instead of
   https://en.wikipedia.org/w/index.php?title=Fooaction=history

 
 This already works.


Both parts of the proposal have been working for a long time. The RFC is
mainly about using the capability in Wikimedia projects.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread MZMcBride
Chad wrote:
On Mon, Sep 16, 2013 at 3:12 PM, Gabriel Wicke gwi...@wikimedia.org
wrote:
 It would be nice to

 * drop the /wiki/ prefix
   https://en.wikipedia.org/Foo instead of
   https://en.wikipedia.org/wiki/Foo

 * use simple action urls
   https://en.wikipedia.org/Foo?action=history instead of
   https://en.wikipedia.org/w/index.php?title=Fooaction=history

 The details of this proposal are discussed in the following RFC:

 https://www.mediawiki.org/wiki/Requests_for_comment/Clean_up_URLs

Even better would be getting rid of action urls entirely.

In favor of what? Special page URLs?

A variant on https://en.wikipedia.org/Foo?action=history is
https://en.wikipedia.org/history/Foo (using $wgActionPaths).

The RFC currently seems to gloss over what problem is attempting to be
solved here and what benefits a new URL structure might bring. I'd like to
see a clearer statement of a problem and benefits to a switch, taking into
account, for example, the overarching goal of making URLs fully localized.

MZMcBride



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Tyler Romeo
On Mon, Sep 16, 2013 at 6:34 PM, Gabriel Wicke gwi...@wikimedia.org wrote:

 There *might* be, in theory. In practice I doubt that there are any
 articles starting with 'w/'. To avoid future conflicts, we should
 probably prefix private paths with an underscore as titles cannot start
 with it (and REST APIs often use it for special resources).


When talking about URI design and REST, it has nothing to do with
functionality, but with organization and logical design. In URIs, the path
part of the URI is considered a hierarchical structure. It doesn't make
sense for api.php to be a sub-resource of the wiki itself. Even doing some
sort of underscore design wouldn't make sense, because you're implying that
the _images/ resource is the same level sub-resource as a normal article.

*-- *
*Tyler Romeo*
Stevens Institute of Technology, Class of 2016
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Jay Ashworth
- Original Message -
 From: MZMcBride z...@mzmcbride.com

 The RFC currently seems to gloss over what problem is attempting to be
 solved here and what benefits a new URL structure might bring. I'd like to
 see a clearer statement of a problem and benefits to a switch, taking
 into account, for example, the overarching goal of making URLs fully
 localized.

Concur, especially in light of the face that *this does not permit you to
break the old URLs*.  They are everywhere, *and they must continue to work
forever*.

I hope I don't even have to justify why.

Cheers,
-- jra
-- 
Jay R. Ashworth  Baylink   j...@baylink.com
Designer The Things I Think   RFC 2100
Ashworth  Associates http://baylink.pitas.com 2000 Land Rover DII
St Petersburg FL USA   #natog  +1 727 647 1274

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Petr Onderka
On Tue, Sep 17, 2013 at 12:34 AM, Gabriel Wicke gwi...@wikimedia.org wrote:
 In practice I doubt that there are any articles starting with 'w/'.

Actually, there are. Looking at enwiktionary only, there are 10 pages
starting with w/.
Some of those are redirects (e.g w/r/t), but others are normal
articles (e.g. w/, w/e).

Petr Onderka
[[en:User:Svick]]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Brian Wolff
On 2013-09-16 7:12 PM, Gabriel Wicke gwi...@wikimedia.org wrote:

 Hi,

 while tinkering with a RESTful content API I was reminded of an old pet
 peeve of mine: The URLs we use in Wikimedia projects are relatively long
 and ugly. I believe that we now have the ability to clean this up if we
 want to.

 It would be nice to

 * drop the /wiki/ prefix
   https://en.wikipedia.org/Foo instead of
   https://en.wikipedia.org/wiki/Foo

 * use simple action urls
   https://en.wikipedia.org/Foo?action=history instead of
   https://en.wikipedia.org/w/index.php?title=Fooaction=history

 The details of this proposal are discussed in the following RFC:



 I'm looking forward to your input!

 Gabriel

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Well I'm not particularly fond of this idea (probably because im stuck in
my ways more than anything else), I do think that making the
en.wikipedia.org/foo be an instant http redirect instead of did you
mean/redirecting in 5 seconds message we currently have might make sense.

Additionally there is some security issues in ie6 when doing foo?action=raw
if I recall.

-bawolff
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Gabriel Wicke
On 09/16/2013 03:25 PM, Ryan Lane wrote:
 https://www.mediawiki.org/wiki/Manual:Short_URL#URL_like_-_example.com.2FPage_title

 
 *Warning:* this method may create an unstable URL structure and leave some
 page names unusable on your wiki. See Manual:Wiki in site root
 directoryhttps://www.mediawiki.org/wiki/Manual:Wiki_in_site_root_directory.
 Please see the article Cool URIs don't
 changehttp://www.w3.org/Provider/Style/URIand take a few minutes to
 devise a stable URL structure for your web site
 before hopping willy-nilly into rewrites into the URL root.

That is a very vague warning. So far I have lower-case 'favicon.ico',
'robots.txt' and 'w/' as potential conflicts. Do you see any others?

In general, I see removing /wiki/ as the less important part of the RFC.
Using sub-resources rather than the random switch to /w/index.php is
more important for caching (promotes deterministic URLs) and does not
seem to involve similar trade-offs.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Ryan Lane
On Mon, Sep 16, 2013 at 4:41 PM, Gabriel Wicke gwi...@wikimedia.org wrote:

 On 09/16/2013 03:25 PM, Ryan Lane wrote:
 
 https://www.mediawiki.org/wiki/Manual:Short_URL#URL_like_-_example.com.2FPage_title
 
 
  *Warning:* this method may create an unstable URL structure and leave
 some
  page names unusable on your wiki. See Manual:Wiki in site root
  directory
 https://www.mediawiki.org/wiki/Manual:Wiki_in_site_root_directory.
  Please see the article Cool URIs don't
  changehttp://www.w3.org/Provider/Style/URIand take a few minutes to
  devise a stable URL structure for your web site
  before hopping willy-nilly into rewrites into the URL root.

 That is a very vague warning. So far I have lower-case 'favicon.ico',
 'robots.txt' and 'w/' as potential conflicts. Do you see any others?


Any of the entry points? Any new entry point? Anything we ever want to put
into the root?

- Ryan
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Gabriel Wicke
On 09/16/2013 04:09 PM, Petr Onderka wrote:
 On Tue, Sep 17, 2013 at 12:34 AM, Gabriel Wicke gwi...@wikimedia.org wrote:
 In practice I doubt that there are any articles starting with 'w/'.
 
 Actually, there are. Looking at enwiktionary only, there are 10 pages
 starting with w/.
 Some of those are redirects (e.g w/r/t), but others are normal
 articles (e.g. w/, w/e).

Ah, ok. That would make it hard to keep /w/api.php working. /_w/api.php
would not suffer from that problem, but then current API users would break.

So I guess that kills the /wiki/ removal in the shorter term. Maybe we
should however consider using something like /_w/ if we ever introduce a
new API entry point to avoid conflicts with valid article names in the
future.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Tyler Romeo
On Mon, Sep 16, 2013 at 7:51 PM, Gabriel Wicke gwi...@wikimedia.org wrote:

 Ah, ok. That would make it hard to keep /w/api.php working. /_w/api.php
 would not suffer from that problem, but then current API users would break.

 So I guess that kills the /wiki/ removal in the shorter term. Maybe we
 should however consider using something like /_w/ if we ever introduce a
 new API entry point to avoid conflicts with valid article names in the
 future.


I disagree. Having separate naming conventions for our entry points just
makes things more inconsistent. Also I don't think it's even necessary in
the first place to get rid of the /wiki/. It doesn't look messy at all.

*-- *
*Tyler Romeo*
Stevens Institute of Technology, Class of 2016
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Steven Walling
On Mon, Sep 16, 2013 at 3:36 PM, MZMcBride z...@mzmcbride.com wrote:

 The RFC currently seems to gloss over what problem is attempting to be
 solved here and what benefits a new URL structure might bring. I'd like to
 see a clearer statement of a problem and benefits to a switch, taking into
 account, for example, the overarching goal of making URLs fully localized.


How about the following?

Our current URL structure is extremely obtuse for non-technical users, and
generally defies their expectations. To most people,
en.wikipedia.org/Dogor even
wikipedia.org/Dog should work just fine, not produce a 404.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Tyler Romeo
On Mon, Sep 16, 2013 at 8:20 PM, Steven Walling steven.wall...@gmail.comwrote:

 Our current URL structure is extremely obtuse for non-technical users, and
 generally defies their expectations. To most people,
 en.wikipedia.org/Dogor even
 wikipedia.org/Dog should work just fine, not produce a 404.


To be fair, both of those links redirect to the proper URL anyway. It
wouldn't be hard to just change that from 404 to a redirect. Nonetheless
the canonical URI should still be /wiki/Article_title.

*-- *
*Tyler Romeo*
Stevens Institute of Technology, Class of 2016
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Jay Ashworth
- Original Message -
 From: Steven Walling steven.wall...@gmail.com

 How about the following?
 
 Our current URL structure is extremely obtuse for non-technical users,
 and generally defies their expectations. To most people,
 en.wikipedia.org/Dogor even
 wikipedia.org/Dog should work just fine, not produce a 404.

Any collection of most people large enough to justify a change like this
is, I assert, too technically unsophisticated to be attempting to construct
URLs by hand (rather than by copy/pasta).

Do you propose to fix also the capitalization and spacing and URLescaping
rules, which are much more complicated than that?

My considered reaction, now after several hours, is that this is fixing
a problem which is not really broken for *anyone* except those who are
OCD about hiding the tech-y look in the Location box.  No offense. :-)

Cheers,
-- jra
-- 
Jay R. Ashworth  Baylink   j...@baylink.com
Designer The Things I Think   RFC 2100
Ashworth  Associates http://baylink.pitas.com 2000 Land Rover DII
St Petersburg FL USA   #natog  +1 727 647 1274

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Gabriel Wicke
On 09/16/2013 04:34 PM, Brian Wolff wrote:
 Additionally there is some security issues in ie6 when doing foo?action=raw
 if I recall.

Yes, IIRC some version of IE disregarded the Content-type header and
guessed the content type based on the URL and the content. If the URL
contained .php (only outside the query string?), it disabled this behavior.

Tim mentions in
https://www.mediawiki.org/wiki/Special:Code/MediaWiki/49833#c3561 that
this only applied to IE3 and earlier, and IE4 respects the Content-type
header. As the market share of IE = 3 is probably non-existent we could
probably blacklist it from logging in and content API access altogether.

According to [1] and [2] there is also a 'X-Content-Type-Options:
nosniff' header that disables this behavior for IE and Chrome. I doubt
that it works in IE3 though. Anybody up for some testing with an ancient
IE3 install?

Gabriel

[1]: http://msdn.microsoft.com/en-us/library/dd565661(v=vs.85).aspx
[2]: https://www.owasp.org/index.php/List_of_useful_HTTP_headers

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Gabriel Wicke
On 09/16/2013 04:42 PM, Ryan Lane wrote:
 On Mon, Sep 16, 2013 at 4:41 PM, Gabriel Wicke gwi...@wikimedia.org
 mailto:gwi...@wikimedia.org wrote:

 That is a very vague warning. So far I have lower-case 'favicon.ico',
 'robots.txt' and 'w/' as potential conflicts. Do you see any others?
 
 
 Any of the entry points? Any new entry point? Anything we ever want to
 put into the root?


We should be able to avoid most conflicts by picking prefixed entry
points. However, as we can't drop the clashing /w/api.php any time soon
I have removed the /wiki/ part from the RFC:

https://www.mediawiki.org/wiki/Requests_for_comment/Clean_up_URLs

So now only the conversion from

/w/index.php?title=foo?action=history
to
/foo?action=history

is under discussion.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Tim Starling
On 17/09/13 09:34, Brian Wolff wrote:
 Well I'm not particularly fond of this idea (probably because im stuck in
 my ways more than anything else), I do think that making the
 en.wikipedia.org/foo be an instant http redirect instead of did you
 mean/redirecting in 5 seconds message we currently have might make sense.

The technical situation has not changed since that meta refresh was
introduced, and the same rationale still applies. See e.g.

February 2005:
http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/15711

August 2006:
http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/25605

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Daniel Friesen
On 2013-09-16 7:09 PM, Gabriel Wicke wrote:
 Any of the entry points? Any new entry point? Anything we ever want to
 put into the root?
 We should be able to avoid most conflicts by picking prefixed entry
 points. However, as we can't drop the clashing /w/api.php any time soon
 I have removed the /wiki/ part from the RFC:

 https://www.mediawiki.org/wiki/Requests_for_comment/Clean_up_URLs

 So now only the conversion from

 /w/index.php?title=foo?action=history
 to
 /foo?action=history

 is under discussion.

 Gabriel
Has the practice of disallowing /w/ or /index.php inside robots.txt to
force search engines to completely ignore search, edit pages,
exponential pagination, etc.. been considered?

Btw, side note on root urls. We still have an open bug allowing attacks
on wikis using root paths:
https://bugzilla.wikimedia.org/show_bug.cgi?id=38048

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Tim Starling
On 17/09/13 11:08, Gabriel Wicke wrote:
 On 09/16/2013 04:34 PM, Brian Wolff wrote:
 Additionally there is some security issues in ie6 when doing foo?action=raw
 if I recall.
 
 Yes, IIRC some version of IE disregarded the Content-type header and
 guessed the content type based on the URL and the content. If the URL
 contained .php (only outside the query string?), it disabled this behavior.
 
 Tim mentions in
 https://www.mediawiki.org/wiki/Special:Code/MediaWiki/49833#c3561 that
 this only applied to IE3 and earlier, and IE4 respects the Content-type
 header. As the market share of IE = 3 is probably non-existent we could
 probably blacklist it from logging in and content API access altogether.

This issue affects IE at least up to IE 6, possibly later, see bug 28235.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Gabriel Wicke
On 09/16/2013 07:24 PM, Daniel Friesen wrote:
 On 2013-09-16 7:09 PM, Gabriel Wicke wrote:
 Any of the entry points? Any new entry point? Anything we ever want to
 put into the root?
 We should be able to avoid most conflicts by picking prefixed entry
 points. However, as we can't drop the clashing /w/api.php any time soon
 I have removed the /wiki/ part from the RFC:

 https://www.mediawiki.org/wiki/Requests_for_comment/Clean_up_URLs

 So now only the conversion from

 /w/index.php?title=foo?action=history
 to
 /foo?action=history

 is under discussion.

 Gabriel
 Has the practice of disallowing /w/ or /index.php inside robots.txt to
 force search engines to completely ignore search, edit pages,
 exponential pagination, etc.. been considered?

See
https://www.mediawiki.org/wiki/Requests_for_comment/Clean_up_URLs#Migration

 Btw, side note on root urls. We still have an open bug allowing attacks
 on wikis using root paths:
 https://bugzilla.wikimedia.org/show_bug.cgi?i

That looks like a fixable bug. In Parsoid for example all internal links
are relative, which avoids the protocol-relative URL issue you reported
there.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Jeremy Baron
On Tue, Sep 17, 2013 at 2:09 AM, Gabriel Wicke gwi...@wikimedia.org wrote:
 So now only the conversion from

 /w/index.php?title=foo?action=history
 to
 /foo?action=history

Do you mean:

to
/wiki/foo?action=history

?

 is under discussion.

See also https://gerrit.wikimedia.org/r/51595 and RT# 864 (aka
https://bugzilla.wikimedia.org/21919 ) which all seem to prefer
docroot verification rather than DNS.

-Jeremy

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Jon Robson
I would suggest taking a look at the number of 404s caused by people trying
to access pages without the wiki prefix This would be interesting data
to go alongside this interesting proposal...
On 16 Sep 2013 20:01, Gabriel Wicke gwi...@wikimedia.org wrote:

 On 09/16/2013 07:24 PM, Daniel Friesen wrote:
  On 2013-09-16 7:09 PM, Gabriel Wicke wrote:
  Any of the entry points? Any new entry point? Anything we ever want to
  put into the root?
  We should be able to avoid most conflicts by picking prefixed entry
  points. However, as we can't drop the clashing /w/api.php any time soon
  I have removed the /wiki/ part from the RFC:
 
  https://www.mediawiki.org/wiki/Requests_for_comment/Clean_up_URLs
 
  So now only the conversion from
 
  /w/index.php?title=foo?action=history
  to
  /foo?action=history
 
  is under discussion.
 
  Gabriel
  Has the practice of disallowing /w/ or /index.php inside robots.txt to
  force search engines to completely ignore search, edit pages,
  exponential pagination, etc.. been considered?

 See
 https://www.mediawiki.org/wiki/Requests_for_comment/Clean_up_URLs#Migration

  Btw, side note on root urls. We still have an open bug allowing attacks
  on wikis using root paths:
  https://bugzilla.wikimedia.org/show_bug.cgi?i

 That looks like a fixable bug. In Parsoid for example all internal links
 are relative, which avoids the protocol-relative URL issue you reported
 there.

 Gabriel

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Gabriel Wicke
On 09/16/2013 07:48 PM, Tim Starling wrote:
 On 17/09/13 11:08, Gabriel Wicke wrote:
 Tim mentions in
 https://www.mediawiki.org/wiki/Special:Code/MediaWiki/49833#c3561 that
 this only applied to IE3 and earlier, and IE4 respects the Content-type
 header. As the market share of IE = 3 is probably non-existent we could
 probably blacklist it from logging in and content API access altogether.
 
 This issue affects IE at least up to IE 6, possibly later, see bug 28235.

Thanks for the pointer! It is sad that IE6 (and likely IE7) is still
haunting us. IE8+ is covered by the X-Content-Type-Options header.

It sounds like your Content-Disposition solution [1] should still work
for IE6/7 where that header is not used otherwise. The existing users of
that header all seem to be file-related. Did I miss any use in action
handlers?

Gabriel

[1]: https://bugzilla.wikimedia.org/show_bug.cgi?id=28235#c6

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Gabriel Wicke
On 09/16/2013 08:48 PM, Jeremy Baron wrote:
 On Tue, Sep 17, 2013 at 2:09 AM, Gabriel Wicke gwi...@wikimedia.org wrote:
 /w/index.php?title=foo?action=history
 to
 /foo?action=history
 
 Do you mean:
 
 to
 /wiki/foo?action=history

Yes, sorry. The RFC had it right, in case you read that ;)

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l