Re: [Wikitech-l] Requests with data URIs appended to proper URLs

2014-06-09 Thread Christian Aistleitner
Hi,

On Fri, Jun 06, 2014 at 02:41:07PM +0200, Federico Leva (Nemo) wrote:
 It's not specific to es and pt, it's easy to spot such URLs in any 
 stats.grok.se top pages report: http://stats.grok.se/de/top

That URL is (currently) for March data, and characteristics changed
considerably since then.

In all of June, we have only 2 affected requests for dewiki in the
sampled 1000 stream.
We're always having such background noise.
But eswiki has 2648, and ptwiki 1378 affected requests.

Eswiki and ptwiki together amount to 93% of the affected skew data uri
requests in June, but only to ~2% of all requests.
So for current data, it is specific to eswiki and ptwiki.

Have fun,
Christian



-- 
 quelltextlich e.U.  \\  Christian Aistleitner 
   Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
 Fax:+43 7946 / 20 5 81
 Homepage: http://quelltextlich.at/
---


signature.asc
Description: Digital signature
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Requests with data URIs appended to proper URLs

2014-06-09 Thread Christian Aistleitner
Hi,

On Thu, Jun 05, 2014 at 11:39:28PM -0700, S Page wrote:
 [ Jigsaw CSS validator over-complaining ]

Full ACK.

Especially, the relevant data uris like

 
data:image/png;base64,iVBORw0KGgoNSUhEUgEuCAIAAABmjeQ9RElEQVR42mVO2wrAUAhy/f8fz+niVMTYQ3hLKkgGgN/IPvgIhUYYV/qogdP75J01V+JwrKZr/5YPcnzN3e6t7l+2K+EFX91B1daOi7sASUVORK5CYII=

match the specification's regular expressions for unquoted url
characters [1].

That said ... while standard conformance is nice here, if others have
the feeling that we should quote them nonetheless, I do not see an
issue with quoting them.

 Besides the other theories advanced, might it be sporadic ResourceLoader
 mis-minification?

I had tried loading the resources several times from different
servers, but that didn't bring up anything useful.

 What's the HTTP Referer for these requests, can we tell if it's coming from
 an external link rel  to CSS or CSS a mw.loader.implement() piece of
 JavaScript inserting the CSS?

The referer is empty for ~97% of affected requests.

The chain of requests looks like this:

1. Client fetches some page (like 
http://pt.wikipedia.org/wiki/Copa_do_Mundo_FIFA )

2. Client fetches resources for this page (like
http://bits.wikimedia.org/pt.wikipedia.org/load.php?[...]
  with referer
http://pt.wikipedia.org/wiki/Copa_do_Mundo_FIFA
  ).

3. Client fetches urls with data uri appended (like 
 
http://pt.wikipedia.org/wiki/data:image/png;base64,iVBORw0KGgoNSUhEUgEAAABkAQBvV2fNDUlEQVQIHWNoYBgWEACJ5TIB0K9KcABJRU5ErkJggg==
   with empty referer).


Thanks,
Christian



[1] Well ... both of them. In [2] (normative) the
regular expression

  [!#$%*-\[\]-~]

is used while [3] (also normative :-/ ) uses

  [!#$%*-~]

. So they differ in \ handling. But regardless of the choice of the
above variants, the above url is matched by any of them.


[2] http://www.w3.org/TR/CSS21/syndata.html#tokenization:

[3] http://www.w3.org/TR/CSS21/grammar.html#scanner



-- 
 quelltextlich e.U.  \\  Christian Aistleitner 
   Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
 Fax:+43 7946 / 20 5 81
 Homepage: http://quelltextlich.at/
---


signature.asc
Description: Digital signature
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Requests with data URIs appended to proper URLs

2014-06-09 Thread Brad Jorsch (Anomie)
On Mon, Jun 9, 2014 at 9:34 AM, Christian Aistleitner 
christ...@quelltextlich.at wrote:

 2. Client fetches resources for this page (like
 http://bits.wikimedia.org/pt.wikipedia.org/load.php?[...]
   with referer
 http://pt.wikipedia.org/wiki/Copa_do_Mundo_FIFA
   ).

 3. Client fetches urls with data uri appended (like

 http://pt.wikipedia.org/wiki/data:image/png;base64,iVBORw0KGgoNSUhEUgEAAABkAQBvV2fNDUlEQVQIHWNoYBgWEACJ5TIB0K9KcABJRU5ErkJggg==
with empty referer).


Just out of curiosity, are there any non-data URLs referred to from the CSS
loaded in step 2, and if so do they also lack referer when fetched in their
equivalent to step 3?


-- 
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Requests with data URIs appended to proper URLs

2014-06-09 Thread Christian Aistleitner
Hi,

On Mon, Jun 09, 2014 at 11:33:12AM -0400, Brad Jorsch (Anomie) wrote:
 On Mon, Jun 9, 2014 at 9:34 AM, Christian Aistleitner 
 christ...@quelltextlich.at wrote:
 
  2. Client fetches resources for this page (like
  http://bits.wikimedia.org/pt.wikipedia.org/load.php?[...]
with referer
  http://pt.wikipedia.org/wiki/Copa_do_Mundo_FIFA
).
 
  3. Client fetches urls with data uri appended (like
 
  http://pt.wikipedia.org/wiki/data:image/png;base64,iVBORw0KGgoNSUhEUgEAAABkAQBvV2fNDUlEQVQIHWNoYBgWEACJ5TIB0K9KcABJRU5ErkJggg==
 with empty referer).
 
 
 Just out of curiosity, are there any non-data URLs referred to from the CSS
 loaded in step 2, [...]

For Firefox 29 and Chrome 35: No, I could not find any :-(
Those client sessions do not show requests for the CSS fallback images
or the like.

Thanks,
Christian


P.S.. For the other outlier browsers, we see such requests and also
refers. But there requests are all over the place, and it seems to me
that the outlier browsers are just generally choking badly on the
served CSS. Like for example a session coming with a User-Agent header
from iBrowser/2.7, requesting

   en.m.wikipedia.org/wiki/data:image/png
 (with Referer: http://en.m.wikipedia.org/wiki/Main_Page )

but also

   en.m.wikipedia.org/wiki/bottom,color-stop(0,
   en.m.wikipedia.org/wiki/Top,left
   en.m.wikipedia.org/wiki/linear-gradient(
 (each with Referer: http://en.m.wikipedia.org/wiki/Main_Page )

But those are just a hand-full of requests. I'd write those off as
outliers.



-- 
 quelltextlich e.U.  \\  Christian Aistleitner 
   Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
 Fax:+43 7946 / 20 5 81
 Homepage: http://quelltextlich.at/
---


signature.asc
Description: Digital signature
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Requests with data URIs appended to proper URLs

2014-06-09 Thread Christian Aistleitner
Hi,

On Mon, Jun 09, 2014 at 11:26:35PM +0200, Christian Aistleitner wrote:
 On Mon, Jun 09, 2014 at 11:33:12AM -0400, Brad Jorsch (Anomie) wrote:
  On Mon, Jun 9, 2014 at 9:34 AM, Christian Aistleitner 
  christ...@quelltextlich.at wrote:
  
   2. Client fetches resources for this page (like
   http://bits.wikimedia.org/pt.wikipedia.org/load.php?[...]
 with referer
   http://pt.wikipedia.org/wiki/Copa_do_Mundo_FIFA
 ).
  
   3. Client fetches urls with data uri appended (like
  
   http://pt.wikipedia.org/wiki/data:image/png;base64,iVBORw0KGgoNSUhEUgEAAABkAQBvV2fNDUlEQVQIHWNoYBgWEACJ5TIB0K9KcABJRU5ErkJggg==
  with empty referer).
  
  
  Just out of curiosity, are there any non-data URLs referred to from the CSS
  loaded in step 2, [...]
 
 For Firefox 29 and Chrome 35: No, I could not find any :-(
 Those client sessions do not show requests for the CSS fallback images
 or the like.

Mhmmm ... but then again, although most resources linked in the CSS
get served through bits, eswiki's and ptwiki's CSS also has a few
images that get served from the upload cluster. We would not see such
requests for upload cluster images linked in the CSS as the upload
cluster's cache logs do not yet flow into Hadoop. :-/

Have fun,
Christian


-- 
 quelltextlich e.U.  \\  Christian Aistleitner 
   Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
 Fax:+43 7946 / 20 5 81
 Homepage: http://quelltextlich.at/
---


signature.asc
Description: Digital signature
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Requests with data URIs appended to proper URLs

2014-06-06 Thread S Page
On Thu, Jun 5, 2014 at 3:44 PM, Matthew Flaschen mflasc...@wikimedia.org
wrote:


 I did a quick check, and Jigsaw (the W3C's validator) does complain about
 our data URLs on that page:

 http://jigsaw.w3.org/css-validator/validator?uri=https%
 3A%2F%2Fbits.wikimedia.org%2Fes.wikipedia.org%2Fload.php%
 3Fdebug%3Dfalse%26lang%3Des%26modules%3Dext.gadget.a-commons-directo%
 252Cimagenesinfobox%252CrefToolbar%7Cext.uls.nojs%7Cext.visualEditor.
 viewPageTarget.noscript%7Cext.wikihiero%7Cmediawiki.legacy.
 commonPrint%252Cshared%7Cmediawiki.skinning.interface%7Cmediawiki.ui.
 button%7Cskins.vector.styles%26only%3Dstyles%26skin%
 3Dvector%26*profile=css3usermedium=allwarning=1vextwarning=lang=en

 -
  .filehistory a img, #file img:hover

 Value Error : background url(data:image/png;base64,iVBO[blahblah]) is an
 incorrect URL url(data:image/png;base64,iVBO[blahblah]) repeat


The Jigsaw CSS validator complains about any data URL inside url() unless
it's in quotes.  The snippet
.filehistory a img,#file img:hover{
  background:white
url('data:image/png;base64,iVBORw0KGgoNSUhEUgAAABAQCAA6mKC9GElEQVQYV2N4DwX/oYBhgARgDJjEAAkAAEC99wFuu0VFAElFTkSuQmCC')
repeat;
  background:white url(//
bits.wikimedia.org/static-1.24wmf7/skins/common/images/Checker-16x16.png?2014-05-29T15:05:00Z)
repeat!ie
}
passes with the added quotes.

Stackoverflow [1] thinks this a bug in Jigsaw, but regardless why would the
CSS generate bogus requests in a cross-section of browsers?
some less forgiving browsers doesn't normally include 60% Firefox 29 and
31% Chrome 35.

If it's only es and pt, I wonder if it's something else in the
bits.wikimedia response that makes the browser try to interpret the charset
in the data URI as other than charset=US-ASCII . I don't know of a charset
that would not interpret these ASCII characters as ASCII.

Besides the other theories advanced, might it be sporadic ResourceLoader
mis-minification?

What's the HTTP Referer for these requests, can we tell if it's coming from
an external link rel  to CSS or CSS a mw.loader.implement() piece of
JavaScript inserting the CSS?

[1] 
http://stackoverflow.com/questions/15481088/are-unquoted-data-uris-valid-in-css


-- 
=S Page  Features engineer
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Requests with data URIs appended to proper URLs

2014-06-06 Thread Federico Leva (Nemo)
It's not specific to es and pt, it's easy to spot such URLs in any 
stats.grok.se top pages report: http://stats.grok.se/de/top


Nemo

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Requests with data URIs appended to proper URLs

2014-06-05 Thread Bartosz Dziewoński

On Thu, 05 Jun 2014 15:36:07 +0200, Christian Aistleitner 
christ...@quelltextlich.at wrote:


The image data in the data uri scheme decodes to images from
VectorBeta [3] like:
 VectorBeta/resources/typography/images/search-fade.png
  VectorBeta/resources/typography/images/tab-break.png
  VectorBeta/resources/typography/images/tab-current-fade.png
  VectorBeta/resources/typography/images/portal-break.png


These images are also part of the core Vector skin, where
they sit at [mediawiki/core]/skins/vector/images.

--
Matma Rex

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Requests with data URIs appended to proper URLs

2014-06-05 Thread Nuria Ruiz
I can see those images in the CSS  file that results after this call as
background images on the default skin of es.wikipedia. They look correct in
the CSS:

http://bits.wikimedia.org/es.wikipedia.org/load.php?debug=falselang=esmodules=ext.gadget.a-commons-directo%2Cimagenesinfobox%2CrefToolbar%7Cext.uls.nojs%7Cext.visualEditor.viewPageTarget.noscript%7Cext.wikihiero%7Cmediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.skinning.interface%7Cmediawiki.ui.button%7Cskins.vector.stylesonly=stylesskin=vector*


The likely explanation could be that there is some syntax issue (that only
appears on windows?) on the data image path specified by us in the css and
thus some less forgiving browsers are not interpreting the image in the css
as a data url but rather as a regular one and then the url is composed as
if it was a relative url (using the current domain) and fetched (or tried
to fetch)

That is, some browsers on windows are interpreting data urls as if they
were like this:
background-url:url('/some/relative/path')

In any case that fetch should be a 404 so the thing we should probably
think of fixing  going forward is not counting 404 urls in pageviews.






On Thu, Jun 5, 2014 at 6:12 PM, Bartosz Dziewoński matma@gmail.com
wrote:

 On Thu, 05 Jun 2014 15:36:07 +0200, Christian Aistleitner 
 christ...@quelltextlich.at wrote:

  The image data in the data uri scheme decodes to images from
 VectorBeta [3] like:
  VectorBeta/resources/typography/images/search-fade.png
   VectorBeta/resources/typography/images/tab-break.png
   VectorBeta/resources/typography/images/tab-current-fade.png
   VectorBeta/resources/typography/images/portal-break.png


 These images are also part of the core Vector skin, where
 they sit at [mediawiki/core]/skins/vector/images.

 --
 Matma Rex

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Requests with data URIs appended to proper URLs

2014-06-05 Thread Matthew Flaschen

On 06/05/2014 05:23 PM, Nuria Ruiz wrote:

I can see those images in the CSS  file that results after this call as
background images on the default skin of es.wikipedia. They look correct in
the CSS:

http://bits.wikimedia.org/es.wikipedia.org/load.php?debug=falselang=esmodules=ext.gadget.a-commons-directo%2Cimagenesinfobox%2CrefToolbar%7Cext.uls.nojs%7Cext.visualEditor.viewPageTarget.noscript%7Cext.wikihiero%7Cmediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.skinning.interface%7Cmediawiki.ui.button%7Cskins.vector.stylesonly=stylesskin=vector*


The likely explanation could be that there is some syntax issue (that only
appears on windows?) on the data image path specified by us in the css and
thus some less forgiving browsers are not interpreting the image in the css
as a data url but rather as a regular one and then the url is composed as
if it was a relative url (using the current domain) and fetched (or tried
to fetch)


I did a quick check, and Jigsaw (the W3C's validator) does complain 
about our data URLs on that page:


http://jigsaw.w3.org/css-validator/validator?uri=https%3A%2F%2Fbits.wikimedia.org%2Fes.wikipedia.org%2Fload.php%3Fdebug%3Dfalse%26lang%3Des%26modules%3Dext.gadget.a-commons-directo%252Cimagenesinfobox%252CrefToolbar%7Cext.uls.nojs%7Cext.visualEditor.viewPageTarget.noscript%7Cext.wikihiero%7Cmediawiki.legacy.commonPrint%252Cshared%7Cmediawiki.skinning.interface%7Cmediawiki.ui.button%7Cskins.vector.styles%26only%3Dstyles%26skin%3Dvector%26*profile=css3usermedium=allwarning=1vextwarning=lang=en

-
 .filehistory a img, #file img:hover

Value Error : background 
url(data:image/png;base64,iVBORw0KGgoNSUhEUgAAABAQCAA6mKC9GElEQVQYV2N4DwX/oYBhgARgDJjEAAkAAEC99wFuu0VFAElFTkSuQmCC) 
is an incorrect URL 
url(data:image/png;base64,iVBORw0KGgoNSUhEUgAAABAQCAA6mKC9GElEQVQYV2N4DwX/oYBhgARgDJjEAAkAAEC99wFuu0VFAElFTkSuQmCC) 
repeat

-

I haven't done any background research (is this error new, do browsers 
care about this, is it just the data: protocol or something else

, etc.).

Matt Flaschen

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l