1%...of the browsers that made it through the minimum request count
filter ;). But crawler-traffic overall is actually ~50% of US desktop
traffic, for scale. We get a lot of hits (not so much from Google, who
crawl in a smart way, as Bing, who crawl in a very dumb way)
On 9 March 2015 at 22:54, Ti
Wow, does Googlebot really represent over 1% of our desktop/reader traffic?
Rather interesting compared to that of e.g. WinXP/IE6, which is over 60x
smaller at 0.016%.
But never mind IE6's percentage, that of Google would seem quite high.
— Timo
On 6 Mar 2015, at 01:02, Oliver Keyes wrote:
>
What do you mean by help? Provide assistance in building the
replacement systems we're building?
On 9 March 2015 at 18:49, Roni Wiener wrote:
>
> Thanks for the info, both your points can explain the anomalies I saw.
>
> The mirroring issue can explain the reason why I see many *.mp3 and .*_ep
Team:
Eventlogging backfilling for outage 02/04 to 02/10 is done. Some events
were filled from raw logs, some from processed logs. Because most of the
"droppage" happened intermittently the backfilling just re-run the events
from 02/04 to 02/10 one by one.
Here are the descriptions of the two in
Hi Andrew,
On Mon, Mar 09, 2015 at 11:54:56AM -0400, Andrew Otto wrote:
> > https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load
> Christian, may I move this page into the Cluster/Hadoop/Administration page?
I think a separate page is worth it as the target audience is
different from
Thanks for the info, both your points can explain the anomalies I saw.
The mirroring issue can explain the reason why I see many *.mp3 and .*_ep
titles in the pagecounts files that do not correlate to any Wikipedia page,
probably spammers monetizing music.
How can I help resolving these issue
thanks, Oliver (and James for spotting this).
> On Mar 9, 2015, at 2:30 PM, Oliver Keyes wrote:
>
> Now logged in Phabricator at https://phabricator.wikimedia.org/T92020
>
> On 9 March 2015 at 16:24, Oliver Keyes wrote:
>> Bah; folder names, rather than subdomains.
>>
>> On 9 March 2015 at 16
Now logged in Phabricator at https://phabricator.wikimedia.org/T92020
On 9 March 2015 at 16:24, Oliver Keyes wrote:
> Bah; folder names, rather than subdomains.
>
> On 9 March 2015 at 16:24, Oliver Keyes wrote:
>> Hey all,
>>
>> One of the big improvements of the new definition over the old one
Okay, we'll plan on wprov.
On Wed, Mar 4, 2015 at 12:44 PM, Dan Garry wrote:
> Works for me.
>
> Dan
>
> On 4 March 2015 at 12:33, Adam Baso wrote:
>
>> How about 'wprov'?
>>
>> On Wed, Mar 4, 2015 at 12:29 PM, Dan Garry wrote:
>>
>>> I'd really rather this be either something that's totally n
Bah; folder names, rather than subdomains.
On 9 March 2015 at 16:24, Oliver Keyes wrote:
> Hey all,
>
> One of the big improvements of the new definition over the old one is
> that the old one is not limited to /wiki/. It includes all of the
> chinese and serbian dialects that have their own fold
Hey all,
One of the big improvements of the new definition over the old one is
that the old one is not limited to /wiki/. It includes all of the
chinese and serbian dialects that have their own folder names and were
not appearing, as a result, in the old pageview counts.
James F (thanks James!) r
Well, the raw Double-entry_bookkeeping_system only has 14k views in
that hour, so I have to assume that (55k-14k) views are coming from
some oddly localised URI. Not sanitising input is...one of the many
things we should fix.
But, I would warn you that this is likely automata. Some things I have
s
>Aside from this, I get daily emails about webrequest partition statuses,
and I would at least notice the morning after that something is wrong.
Right, but in the case of Friday that would mean perhaps having to backfill
a bunch of data up to Saturday morning, whereas if we have alarms we can
detec
It's more likely that it's just an attack by automata, rather than a
sharp peak of genuine interest. Since 20150306 is within the last 30
days I can look and check, and will do so now.
On 8 March 2015 at 15:18, Roni Wiener wrote:
> Hi
>
> I was goofing around with the Wikipedia page counts dumps
> Should have icinga alarms arround these types of issues? Seems like that
> would be the way to go.
Aside from this, I get daily emails about webrequest partition statuses, and I
would at least notice the morning after that something is wrong.
> On Mar 7, 2015, at 21:20, Nuria Ruiz wrote:
> https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load
Christian, may I move this page into the Cluster/Hadoop/Administration page?
> Should have icinga alarms arround these types of issues? Seems like that
> would be the way to go.
We used to have icinga alarms based on webrequest
Hi
I was goofing around with the Wikipedia page counts dumps and noticed some
strange anomalies.
For example:
The page "Double-entry_bookkeeping_system" had 55921 page views on
pagecounts-20150306-07.gz
Where it only had 54 views on pagecounts-20150306-10.gz (3 hours later).
Is there a b
Hi Pine,
On Sat, Mar 07, 2015 at 08:15:18PM -0800, Pine W wrote:
> Chris, may I quote your email on BASH?
They take emails too?
Regardless ... feel free to quote or forward any of my emails wherever
you seem fit.
Have fun,
Christian
--
quelltextlich e.U. \\ Christian Aistleit
Thanks a lot Christian :)
I had not meant by any mean last Friday to overload the cluster ... I did
it nonetheless.
Your page on how to 'keep an eye on it' will really be useful!
Cheers
Joseph
On Sun, Mar 8, 2015 at 8:26 PM, Leila Zia wrote:
> This is really useful, Christian. Thanks for explai
Στις 08-03-2015, ημέρα Κυρ, και ώρα 09:54 -0700, ο/η Vipul Naik έγραψε:
> Seems like stats.grok.se hasn't updated for the last two days again.
> Will it be back to updating soon?
>
Henrik, if bandwidth is below what you were seeing (i.e. overloaded
again), for now you could point to ms1001.wikimed
20 matches
Mail list logo