Here's the patch update. https://gerrit.wikimedia.org/r/141740
On Mon, Jun 23, 2014 at 3:30 PM, Adam Baso <ab...@wikimedia.org> wrote: > One wrinkle we've encountered and sort of expected, is that the SIM card > MCC-MNC doesn't always match the actual network MCC-MNC. So on Android, > we'll add both to the payload so that we can differentiate them. On iOS it > looks like the API only currently allows one of these values through an > opaque method call. The previous EventLogging server side code wasn't > logging the User-Agent (defined coarsely in our code on both platforms). > I'm thinking to make it evident when we're dealing with an iOS version of > the app, it would make most sense to re-enable the User-Agent so we can > pick up this coarse-grained value. I wanted to put this User-Agent item out > here for a brief period before adding the code, though. > > -Adam > > > > > On Fri, May 30, 2014 at 2:04 PM, Adam Baso <ab...@wikimedia.org> wrote: > >> Okay, the code is in place in the alphas of both the Android and iOS >> apps, and the server-side 2% sampling (extra header in HTTPS request sent >> once per cellular app session) is working. >> >> >> https://git.wikimedia.org/commitdiff/apps%2Fandroid%2Fwikipedia.git/8b4a0c3b170d6bf1a8f8141d93dfc60416ae4e2b >> >> >> https://git.wikimedia.org/commitdiff/apps%2Fios%2Fwikipedia.git/59cde497921bc6d2c28e3967c24f0316dfedf3ce >> >> >> https://git.wikimedia.org/commitdiff/mediawiki%2Fextensions%2FZeroRatedMobileAccess.git/df3da0b3fa564ae27d33cd1b82f81df12a5ed287 >> >> Changes to event logging in the iOS alpha app (internal only at the >> moment, although repo can be cloned and run in the Xcode simulator) are >> coming pretty soon, and once those are in, we'll make one last tweak there >> to have the app not add the extra MCC/MNC header on that single request per >> cellular connection when logging is turned off in the iOS alpha app. That >> part is done in the Android app already. >> >> -Adam >> >> >> >> >> On Fri, May 2, 2014 at 1:16 PM, Adam Baso <ab...@wikimedia.org> wrote: >> >>> Federico asked if sampling might make sense here. I think it will work, >>> so I've updated the patchset. >>> >>> From a patchset comment I provided: >>> >>> "It's possible we may have situations where operators have not lots of >>> users on them accessing Wiki(m|p)edia properties, so we do run some risk of >>> actually missing IPs, even if exit IPs are concentrators of typically large >>> sets of users. That said, let's try a 2% sample ratio; and if we find out >>> it's insufficient, then we'll sample more, if it's oversampling, then we >>> can adjust the other way, too. New patchset arriving shortly." >>> >>> (I've since submitted the updated code for review.) >>> >>> -Adam >>> >>> >>> >>> On Thu, May 1, 2014 at 7:52 PM, Adam Baso <ab...@wikimedia.org> wrote: >>> >>>> After examining this, it looks like EventLogging is more suited to the >>>> logging task than debug logging and the trappings of needing to alter debug >>>> logging in the core MediaWiki software. >>>> >>>> EventLogging logs at the resolution of a second (instead of a day), but >>>> has inbuilt support for record removal after 90 days. >>>> >>>> Please do let us know in case of further questions. Here's the logging >>>> schema for those with an interest: >>>> >>>> https://meta.wikimedia.org/wiki/Schema:MobileOperatorCode >>>> >>>> Here's the relevant server code: >>>> >>>> https://gerrit.wikimedia.org/r/#/c/130991/ >>>> >>>> -Adam >>>> >>>> >>>> >>>> >>>> On Wed, Apr 16, 2014 at 2:20 PM, Adam Baso <ab...@wikimedia.org> wrote: >>>> >>>>> Great idea! >>>>> >>>>> Anyone on the list know if there's a way to make the debug log >>>>> facilities do the YYYYMMDD timestamp instead of the longer one? >>>>> >>>>> If not, I suppose we could work to update the core MediaWiki code. [1] >>>>> >>>>> -Adam >>>>> >>>>> 1. For those with PHP skills or equivalent, I'm referring to >>>>> https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba64612ce79b701a13949e/includes%2FGlobalFunctions.php#L1042. >>>>> Scroll to the bottom of the function definition to see the datetimestamp >>>>> approach. >>>>> >>>>> >>>>> On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray < >>>>> andrew.g...@dunelm.org.uk> wrote: >>>>> >>>>>> Hi Adam, >>>>>> >>>>>> One thought: you don't really need the date/time data at any detailed >>>>>> resolution, do you? If what you're wanting it for is to track major >>>>>> changes ("last month it all switched to this IP") and to purge old >>>>>> data ("delete anything older than 10 March"), you could simply log day >>>>>> rather than datetime. >>>>>> >>>>>> enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45 >>>>>> >>>>>> enwiki / 127.0.0.1 / 123.45 / 2014-04-16 >>>>>> >>>>>> - the latter gives you the data you need while making it a lot harder >>>>>> to do any kind of close user-identification. >>>>>> >>>>>> Andrew. >>>>>> On 16 Apr 2014 19:17, "Adam Baso" <ab...@wikimedia.org> wrote: >>>>>> >>>>>> > Inline. >>>>>> > >>>>>> > Thanks for starting this thread. >>>>>> > > >>>>>> > > Sorry if I've overlooked this, but who/what will have access to >>>>>> this >>>>>> > data? >>>>>> > > Only members of the mobile team? Local project CheckUsers? >>>>>> Wikimedia >>>>>> > > Foundation-approved researchers? Wikimedia shell users? >>>>>> AbuseFilter >>>>>> > > filters? >>>>>> > > >>>>>> > >>>>>> > It's a good question. The thought is to put it in the customary >>>>>> wfDebugLog >>>>>> > location (with, for example, filename "mccmnc.log") on fluorine. >>>>>> > >>>>>> > It just occurred to me that the wiki name (e.g., "enwiki"), but not >>>>>> the >>>>>> > full URL, gets logged additionally as part of the wfDebugLog call; >>>>>> to make >>>>>> > the implicit explicit, wfDebugLog adds a datetime stamp as well, >>>>>> and that's >>>>>> > useful for purging old records. I'll forward this email to mobile-l >>>>>> and >>>>>> > wikitech-l to underscore this. >>>>>> > >>>>>> > >>>>>> > > And this may be a silly question, but is there a reasonable means >>>>>> of >>>>>> > > approximating how identifying these two data points alone are? >>>>>> That is, >>>>>> > > Using a mobile country code and exit IP address, is it possible to >>>>>> > > identify a particular editor or reader? Or perhaps rephrased, is >>>>>> this >>>>>> > data >>>>>> > > considered anonymized? >>>>>> > > >>>>>> > >>>>>> > Not a silly question. My approximation is these tuples (datetime, >>>>>> now that >>>>>> > it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not >>>>>> perfectly >>>>>> > anonymized, are low identifying (that is, indirect inferences on >>>>>> the data >>>>>> > in isolation are unlikely, but technically possible, through >>>>>> examination of >>>>>> > short tail outliers in a cluster analysis where such >>>>>> readers/editors exist >>>>>> > in the short tail outliers sets), in contrast to regular web access >>>>>> logs >>>>>> > (where direct inferences are easy). >>>>>> > >>>>>> > Thanks. I'll forward this along now. >>>>>> > >>>>>> > -Adam >>>>>> > _______________________________________________ >>>>>> > Wikimedia-l mailing list >>>>>> > Wikimedia-l@lists.wikimedia.org >>>>>> > Unsubscribe: >>>>>> https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, >>>>>> > <mailto:wikimedia-l-requ...@lists.wikimedia.org >>>>>> ?subject=unsubscribe> >>>>>> _______________________________________________ >>>>>> Wikimedia-l mailing list >>>>>> Wikimedia-l@lists.wikimedia.org >>>>>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, >>>>>> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe> >>>>>> >>>>> >>>>> >>>> >>> >> > _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>