Hi
Thank you Nemo for adverting that interesting page about how to improve
Wikimedia dumping processes. This topic is of course a primary concern
for the Kiwix developer team.
Here my contribution:
I also added some Hadoop based used cases to that document.
https://www.mediawiki.org/w/index.php?title=Wikimedia_MediaWiki_Core_Team%2FBacklog%2FImprove_dumpsdiff=1422073oldid=1421455
On Feb 21, 2015, at 05:03, Emmanuel Engelhart kel...@kiwix.org wrote:
Hi
Thank you Nemo for adverting
I think this is a great idea!
At 2015-02-23 08:33 AM, you wrote:
Dear List,
We, the WMF Analytics team, want to bring more
of our internal discussions public. We benefit
tremendously from everyone who participates on
this list and want to have as much transparency
as possible into what
After the discovery of the duplication problem I reran the comparative
analysis of pageviews implementations. The result: there really isn't
a difference![0] Actually I had to generate a plot with jittered lines
just to be able to /find/ one of them.[1]
Thanks to Christian and Otto for the
Hi Nemo - I think the concern was that it might be the case that the
'title' parameter may be at the end of the URL, and the 'title' parameter
could in principle support a value with forward slashes potentially
indistinguishable from the string in option #2. Of course, regular
expressions can make
On Tue, Feb 24, 2015 at 3:48 PM, Nuria Ruiz nu...@wikimedia.org wrote:
2. What about caching?
Is this page:* http://wikipedia.org/BarackObama?some_param=some-value
http://wikipedia.org/BarackObama?some_param=some-value* being served
from the cache as it should be?
The file download
Michael, a quick heads-up:
So I finally found the time to look into this.
Sorry that it took so long.
https://phabricator.wikimedia.org/T90230
Bug has been analyzed and fixed.
The underlying problem is a record in an hourly pageview dump with empty
title. My script now patches such
If there’s no other objection, we can safely fold this under the
discussion of long-term options and go ahead with the proposed
implementation, per Dan.
I think there are some technical issues to be ironed right?
1. How are we doing so a request like:
it sounds like we have consensus for a short-term solution based on a vanilla
parameter, as long as it doesn’t clash with other internal parameters. I agree
with Gergo that a shortener is appealing as a long-term solution, this is what
the vast majority of platforms are using for analytics