Re: [Wikitech-l] Requests with data URIs appended to proper URLs
Hi, On Fri, Jun 06, 2014 at 02:41:07PM +0200, Federico Leva (Nemo) wrote: It's not specific to es and pt, it's easy to spot such URLs in any stats.grok.se top pages report: http://stats.grok.se/de/top That URL is (currently) for March data, and characteristics changed considerably since then. In all of June, we have only 2 affected requests for dewiki in the sampled 1000 stream. We're always having such background noise. But eswiki has 2648, and ptwiki 1378 affected requests. Eswiki and ptwiki together amount to 93% of the affected skew data uri requests in June, but only to ~2% of all requests. So for current data, it is specific to eswiki and ptwiki. Have fun, Christian -- quelltextlich e.U. \\ Christian Aistleitner Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christ...@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax:+43 7946 / 20 5 81 Homepage: http://quelltextlich.at/ --- signature.asc Description: Digital signature ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Requests with data URIs appended to proper URLs
Hi, On Thu, Jun 05, 2014 at 11:39:28PM -0700, S Page wrote: [ Jigsaw CSS validator over-complaining ] Full ACK. Especially, the relevant data uris like data:image/png;base64,iVBORw0KGgoNSUhEUgEuCAIAAABmjeQ9RElEQVR42mVO2wrAUAhy/f8fz+niVMTYQ3hLKkgGgN/IPvgIhUYYV/qogdP75J01V+JwrKZr/5YPcnzN3e6t7l+2K+EFX91B1daOi7sASUVORK5CYII= match the specification's regular expressions for unquoted url characters [1]. That said ... while standard conformance is nice here, if others have the feeling that we should quote them nonetheless, I do not see an issue with quoting them. Besides the other theories advanced, might it be sporadic ResourceLoader mis-minification? I had tried loading the resources several times from different servers, but that didn't bring up anything useful. What's the HTTP Referer for these requests, can we tell if it's coming from an external link rel to CSS or CSS a mw.loader.implement() piece of JavaScript inserting the CSS? The referer is empty for ~97% of affected requests. The chain of requests looks like this: 1. Client fetches some page (like http://pt.wikipedia.org/wiki/Copa_do_Mundo_FIFA ) 2. Client fetches resources for this page (like http://bits.wikimedia.org/pt.wikipedia.org/load.php?[...] with referer http://pt.wikipedia.org/wiki/Copa_do_Mundo_FIFA ). 3. Client fetches urls with data uri appended (like http://pt.wikipedia.org/wiki/data:image/png;base64,iVBORw0KGgoNSUhEUgEAAABkAQBvV2fNDUlEQVQIHWNoYBgWEACJ5TIB0K9KcABJRU5ErkJggg== with empty referer). Thanks, Christian [1] Well ... both of them. In [2] (normative) the regular expression [!#$%*-\[\]-~] is used while [3] (also normative :-/ ) uses [!#$%*-~] . So they differ in \ handling. But regardless of the choice of the above variants, the above url is matched by any of them. [2] http://www.w3.org/TR/CSS21/syndata.html#tokenization: [3] http://www.w3.org/TR/CSS21/grammar.html#scanner -- quelltextlich e.U. \\ Christian Aistleitner Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christ...@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax:+43 7946 / 20 5 81 Homepage: http://quelltextlich.at/ --- signature.asc Description: Digital signature ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Requests with data URIs appended to proper URLs
On Mon, Jun 9, 2014 at 9:34 AM, Christian Aistleitner christ...@quelltextlich.at wrote: 2. Client fetches resources for this page (like http://bits.wikimedia.org/pt.wikipedia.org/load.php?[...] with referer http://pt.wikipedia.org/wiki/Copa_do_Mundo_FIFA ). 3. Client fetches urls with data uri appended (like http://pt.wikipedia.org/wiki/data:image/png;base64,iVBORw0KGgoNSUhEUgEAAABkAQBvV2fNDUlEQVQIHWNoYBgWEACJ5TIB0K9KcABJRU5ErkJggg== with empty referer). Just out of curiosity, are there any non-data URLs referred to from the CSS loaded in step 2, and if so do they also lack referer when fetched in their equivalent to step 3? -- Brad Jorsch (Anomie) Software Engineer Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Requests with data URIs appended to proper URLs
Hi, On Mon, Jun 09, 2014 at 11:33:12AM -0400, Brad Jorsch (Anomie) wrote: On Mon, Jun 9, 2014 at 9:34 AM, Christian Aistleitner christ...@quelltextlich.at wrote: 2. Client fetches resources for this page (like http://bits.wikimedia.org/pt.wikipedia.org/load.php?[...] with referer http://pt.wikipedia.org/wiki/Copa_do_Mundo_FIFA ). 3. Client fetches urls with data uri appended (like http://pt.wikipedia.org/wiki/data:image/png;base64,iVBORw0KGgoNSUhEUgEAAABkAQBvV2fNDUlEQVQIHWNoYBgWEACJ5TIB0K9KcABJRU5ErkJggg== with empty referer). Just out of curiosity, are there any non-data URLs referred to from the CSS loaded in step 2, [...] For Firefox 29 and Chrome 35: No, I could not find any :-( Those client sessions do not show requests for the CSS fallback images or the like. Thanks, Christian P.S.. For the other outlier browsers, we see such requests and also refers. But there requests are all over the place, and it seems to me that the outlier browsers are just generally choking badly on the served CSS. Like for example a session coming with a User-Agent header from iBrowser/2.7, requesting en.m.wikipedia.org/wiki/data:image/png (with Referer: http://en.m.wikipedia.org/wiki/Main_Page ) but also en.m.wikipedia.org/wiki/bottom,color-stop(0, en.m.wikipedia.org/wiki/Top,left en.m.wikipedia.org/wiki/linear-gradient( (each with Referer: http://en.m.wikipedia.org/wiki/Main_Page ) But those are just a hand-full of requests. I'd write those off as outliers. -- quelltextlich e.U. \\ Christian Aistleitner Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christ...@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax:+43 7946 / 20 5 81 Homepage: http://quelltextlich.at/ --- signature.asc Description: Digital signature ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Requests with data URIs appended to proper URLs
Hi, On Mon, Jun 09, 2014 at 11:26:35PM +0200, Christian Aistleitner wrote: On Mon, Jun 09, 2014 at 11:33:12AM -0400, Brad Jorsch (Anomie) wrote: On Mon, Jun 9, 2014 at 9:34 AM, Christian Aistleitner christ...@quelltextlich.at wrote: 2. Client fetches resources for this page (like http://bits.wikimedia.org/pt.wikipedia.org/load.php?[...] with referer http://pt.wikipedia.org/wiki/Copa_do_Mundo_FIFA ). 3. Client fetches urls with data uri appended (like http://pt.wikipedia.org/wiki/data:image/png;base64,iVBORw0KGgoNSUhEUgEAAABkAQBvV2fNDUlEQVQIHWNoYBgWEACJ5TIB0K9KcABJRU5ErkJggg== with empty referer). Just out of curiosity, are there any non-data URLs referred to from the CSS loaded in step 2, [...] For Firefox 29 and Chrome 35: No, I could not find any :-( Those client sessions do not show requests for the CSS fallback images or the like. Mhmmm ... but then again, although most resources linked in the CSS get served through bits, eswiki's and ptwiki's CSS also has a few images that get served from the upload cluster. We would not see such requests for upload cluster images linked in the CSS as the upload cluster's cache logs do not yet flow into Hadoop. :-/ Have fun, Christian -- quelltextlich e.U. \\ Christian Aistleitner Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christ...@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax:+43 7946 / 20 5 81 Homepage: http://quelltextlich.at/ --- signature.asc Description: Digital signature ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Requests with data URIs appended to proper URLs
On Thu, Jun 5, 2014 at 3:44 PM, Matthew Flaschen mflasc...@wikimedia.org wrote: I did a quick check, and Jigsaw (the W3C's validator) does complain about our data URLs on that page: http://jigsaw.w3.org/css-validator/validator?uri=https% 3A%2F%2Fbits.wikimedia.org%2Fes.wikipedia.org%2Fload.php% 3Fdebug%3Dfalse%26lang%3Des%26modules%3Dext.gadget.a-commons-directo% 252Cimagenesinfobox%252CrefToolbar%7Cext.uls.nojs%7Cext.visualEditor. viewPageTarget.noscript%7Cext.wikihiero%7Cmediawiki.legacy. commonPrint%252Cshared%7Cmediawiki.skinning.interface%7Cmediawiki.ui. button%7Cskins.vector.styles%26only%3Dstyles%26skin% 3Dvector%26*profile=css3usermedium=allwarning=1vextwarning=lang=en - .filehistory a img, #file img:hover Value Error : background url(data:image/png;base64,iVBO[blahblah]) is an incorrect URL url(data:image/png;base64,iVBO[blahblah]) repeat The Jigsaw CSS validator complains about any data URL inside url() unless it's in quotes. The snippet .filehistory a img,#file img:hover{ background:white url('data:image/png;base64,iVBORw0KGgoNSUhEUgAAABAQCAA6mKC9GElEQVQYV2N4DwX/oYBhgARgDJjEAAkAAEC99wFuu0VFAElFTkSuQmCC') repeat; background:white url(// bits.wikimedia.org/static-1.24wmf7/skins/common/images/Checker-16x16.png?2014-05-29T15:05:00Z) repeat!ie } passes with the added quotes. Stackoverflow [1] thinks this a bug in Jigsaw, but regardless why would the CSS generate bogus requests in a cross-section of browsers? some less forgiving browsers doesn't normally include 60% Firefox 29 and 31% Chrome 35. If it's only es and pt, I wonder if it's something else in the bits.wikimedia response that makes the browser try to interpret the charset in the data URI as other than charset=US-ASCII . I don't know of a charset that would not interpret these ASCII characters as ASCII. Besides the other theories advanced, might it be sporadic ResourceLoader mis-minification? What's the HTTP Referer for these requests, can we tell if it's coming from an external link rel to CSS or CSS a mw.loader.implement() piece of JavaScript inserting the CSS? [1] http://stackoverflow.com/questions/15481088/are-unquoted-data-uris-valid-in-css -- =S Page Features engineer ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Requests with data URIs appended to proper URLs
It's not specific to es and pt, it's easy to spot such URLs in any stats.grok.se top pages report: http://stats.grok.se/de/top Nemo ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Requests with data URIs appended to proper URLs
On Thu, 05 Jun 2014 15:36:07 +0200, Christian Aistleitner christ...@quelltextlich.at wrote: The image data in the data uri scheme decodes to images from VectorBeta [3] like: VectorBeta/resources/typography/images/search-fade.png VectorBeta/resources/typography/images/tab-break.png VectorBeta/resources/typography/images/tab-current-fade.png VectorBeta/resources/typography/images/portal-break.png These images are also part of the core Vector skin, where they sit at [mediawiki/core]/skins/vector/images. -- Matma Rex ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Requests with data URIs appended to proper URLs
I can see those images in the CSS file that results after this call as background images on the default skin of es.wikipedia. They look correct in the CSS: http://bits.wikimedia.org/es.wikipedia.org/load.php?debug=falselang=esmodules=ext.gadget.a-commons-directo%2Cimagenesinfobox%2CrefToolbar%7Cext.uls.nojs%7Cext.visualEditor.viewPageTarget.noscript%7Cext.wikihiero%7Cmediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.skinning.interface%7Cmediawiki.ui.button%7Cskins.vector.stylesonly=stylesskin=vector* The likely explanation could be that there is some syntax issue (that only appears on windows?) on the data image path specified by us in the css and thus some less forgiving browsers are not interpreting the image in the css as a data url but rather as a regular one and then the url is composed as if it was a relative url (using the current domain) and fetched (or tried to fetch) That is, some browsers on windows are interpreting data urls as if they were like this: background-url:url('/some/relative/path') In any case that fetch should be a 404 so the thing we should probably think of fixing going forward is not counting 404 urls in pageviews. On Thu, Jun 5, 2014 at 6:12 PM, Bartosz Dziewoński matma@gmail.com wrote: On Thu, 05 Jun 2014 15:36:07 +0200, Christian Aistleitner christ...@quelltextlich.at wrote: The image data in the data uri scheme decodes to images from VectorBeta [3] like: VectorBeta/resources/typography/images/search-fade.png VectorBeta/resources/typography/images/tab-break.png VectorBeta/resources/typography/images/tab-current-fade.png VectorBeta/resources/typography/images/portal-break.png These images are also part of the core Vector skin, where they sit at [mediawiki/core]/skins/vector/images. -- Matma Rex ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Requests with data URIs appended to proper URLs
On 06/05/2014 05:23 PM, Nuria Ruiz wrote: I can see those images in the CSS file that results after this call as background images on the default skin of es.wikipedia. They look correct in the CSS: http://bits.wikimedia.org/es.wikipedia.org/load.php?debug=falselang=esmodules=ext.gadget.a-commons-directo%2Cimagenesinfobox%2CrefToolbar%7Cext.uls.nojs%7Cext.visualEditor.viewPageTarget.noscript%7Cext.wikihiero%7Cmediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.skinning.interface%7Cmediawiki.ui.button%7Cskins.vector.stylesonly=stylesskin=vector* The likely explanation could be that there is some syntax issue (that only appears on windows?) on the data image path specified by us in the css and thus some less forgiving browsers are not interpreting the image in the css as a data url but rather as a regular one and then the url is composed as if it was a relative url (using the current domain) and fetched (or tried to fetch) I did a quick check, and Jigsaw (the W3C's validator) does complain about our data URLs on that page: http://jigsaw.w3.org/css-validator/validator?uri=https%3A%2F%2Fbits.wikimedia.org%2Fes.wikipedia.org%2Fload.php%3Fdebug%3Dfalse%26lang%3Des%26modules%3Dext.gadget.a-commons-directo%252Cimagenesinfobox%252CrefToolbar%7Cext.uls.nojs%7Cext.visualEditor.viewPageTarget.noscript%7Cext.wikihiero%7Cmediawiki.legacy.commonPrint%252Cshared%7Cmediawiki.skinning.interface%7Cmediawiki.ui.button%7Cskins.vector.styles%26only%3Dstyles%26skin%3Dvector%26*profile=css3usermedium=allwarning=1vextwarning=lang=en - .filehistory a img, #file img:hover Value Error : background url(data:image/png;base64,iVBORw0KGgoNSUhEUgAAABAQCAA6mKC9GElEQVQYV2N4DwX/oYBhgARgDJjEAAkAAEC99wFuu0VFAElFTkSuQmCC) is an incorrect URL url(data:image/png;base64,iVBORw0KGgoNSUhEUgAAABAQCAA6mKC9GElEQVQYV2N4DwX/oYBhgARgDJjEAAkAAEC99wFuu0VFAElFTkSuQmCC) repeat - I haven't done any background research (is this error new, do browsers care about this, is it just the data: protocol or something else , etc.). Matt Flaschen ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l