Here's the patch update.

https://gerrit.wikimedia.org/r/141740


On Mon, Jun 23, 2014 at 3:30 PM, Adam Baso <ab...@wikimedia.org> wrote:

> One wrinkle we've encountered and sort of expected, is that the SIM card
> MCC-MNC doesn't always match the actual network MCC-MNC. So on Android,
> we'll add both to the payload so that we can differentiate them. On iOS it
> looks like the API only currently allows one of these values through an
> opaque method call. The previous EventLogging server side code wasn't
> logging the User-Agent (defined coarsely in our code on both platforms).
> I'm thinking to make it evident when we're dealing with an iOS version of
> the app, it would make most sense to re-enable the User-Agent so we can
> pick up this coarse-grained value. I wanted to put this User-Agent item out
> here for a brief period before adding the code, though.
>
> -Adam
>
>
>
>
> On Fri, May 30, 2014 at 2:04 PM, Adam Baso <ab...@wikimedia.org> wrote:
>
>> Okay, the code is in place in the alphas of both the Android and iOS
>> apps, and the server-side 2% sampling (extra header in HTTPS request sent
>> once per cellular app session) is working.
>>
>>
>> https://git.wikimedia.org/commitdiff/apps%2Fandroid%2Fwikipedia.git/8b4a0c3b170d6bf1a8f8141d93dfc60416ae4e2b
>>
>>
>> https://git.wikimedia.org/commitdiff/apps%2Fios%2Fwikipedia.git/59cde497921bc6d2c28e3967c24f0316dfedf3ce
>>
>>
>> https://git.wikimedia.org/commitdiff/mediawiki%2Fextensions%2FZeroRatedMobileAccess.git/df3da0b3fa564ae27d33cd1b82f81df12a5ed287
>>
>> Changes to event logging in the iOS alpha app (internal only at the
>> moment, although repo can be cloned and run in the Xcode simulator) are
>> coming pretty soon, and once those are in, we'll make one last tweak there
>> to have the app not add the extra MCC/MNC header on that single request per
>> cellular connection when logging is turned off in the iOS alpha app. That
>> part is done in the Android app already.
>>
>> -Adam
>>
>>
>>
>>
>> On Fri, May 2, 2014 at 1:16 PM, Adam Baso <ab...@wikimedia.org> wrote:
>>
>>> Federico asked if sampling might make sense here. I think it will work,
>>> so I've updated the patchset.
>>>
>>> From a patchset comment I provided:
>>>
>>> "It's possible we may have situations where operators have not lots of
>>> users on them accessing Wiki(m|p)edia properties, so we do run some risk of
>>> actually missing IPs, even if exit IPs are concentrators of typically large
>>> sets of users. That said, let's try a 2% sample ratio; and if we find out
>>> it's insufficient, then we'll sample more, if it's oversampling, then we
>>> can adjust the other way, too. New patchset arriving shortly."
>>>
>>> (I've since submitted the updated code for review.)
>>>
>>> -Adam
>>>
>>>
>>>
>>> On Thu, May 1, 2014 at 7:52 PM, Adam Baso <ab...@wikimedia.org> wrote:
>>>
>>>> After examining this, it looks like EventLogging is more suited to the
>>>> logging task than debug logging and the trappings of needing to alter debug
>>>> logging in the core MediaWiki software.
>>>>
>>>> EventLogging logs at the resolution of a second (instead of a day), but
>>>> has inbuilt support for record removal after 90 days.
>>>>
>>>> Please do let us know in case of further questions. Here's the logging
>>>> schema for those with an interest:
>>>>
>>>> https://meta.wikimedia.org/wiki/Schema:MobileOperatorCode
>>>>
>>>> Here's the relevant server code:
>>>>
>>>> https://gerrit.wikimedia.org/r/#/c/130991/
>>>>
>>>> -Adam
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Apr 16, 2014 at 2:20 PM, Adam Baso <ab...@wikimedia.org> wrote:
>>>>
>>>>> Great idea!
>>>>>
>>>>> Anyone on the list know if there's a way to make the debug log
>>>>> facilities do the YYYYMMDD timestamp instead of the longer one?
>>>>>
>>>>> If not, I suppose we could work to update the core MediaWiki code. [1]
>>>>>
>>>>> -Adam
>>>>>
>>>>> 1. For those with PHP skills or equivalent, I'm referring to
>>>>> https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba64612ce79b701a13949e/includes%2FGlobalFunctions.php#L1042.
>>>>> Scroll to the bottom of the function definition to see the datetimestamp
>>>>> approach.
>>>>>
>>>>>
>>>>> On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray <
>>>>> andrew.g...@dunelm.org.uk> wrote:
>>>>>
>>>>>> Hi Adam,
>>>>>>
>>>>>> One thought: you don't really need the date/time data at any detailed
>>>>>> resolution, do you? If what you're wanting it for is to track major
>>>>>> changes ("last month it all switched to this IP") and to purge old
>>>>>> data ("delete anything older than 10 March"), you could simply log day
>>>>>> rather than datetime.
>>>>>>
>>>>>> enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45
>>>>>>
>>>>>> enwiki / 127.0.0.1 / 123.45 / 2014-04-16
>>>>>>
>>>>>> - the latter gives you the data you need while making it a lot harder
>>>>>> to do any kind of close user-identification.
>>>>>>
>>>>>> Andrew.
>>>>>> On 16 Apr 2014 19:17, "Adam Baso" <ab...@wikimedia.org> wrote:
>>>>>>
>>>>>> > Inline.
>>>>>> >
>>>>>> > Thanks for starting this thread.
>>>>>> > >
>>>>>> > > Sorry if I've overlooked this, but who/what will have access to
>>>>>> this
>>>>>> > data?
>>>>>> > > Only members of the mobile team? Local project CheckUsers?
>>>>>> Wikimedia
>>>>>> > > Foundation-approved researchers? Wikimedia shell users?
>>>>>> AbuseFilter
>>>>>> > > filters?
>>>>>> > >
>>>>>> >
>>>>>> > It's a good question. The thought is to put it in the customary
>>>>>> wfDebugLog
>>>>>> > location (with, for example, filename "mccmnc.log") on fluorine.
>>>>>> >
>>>>>> > It just occurred to me that the wiki name (e.g., "enwiki"), but not
>>>>>> the
>>>>>> > full URL, gets logged additionally as part of the wfDebugLog call;
>>>>>> to make
>>>>>> > the implicit explicit, wfDebugLog adds a datetime stamp as well,
>>>>>> and that's
>>>>>> > useful for purging old records. I'll forward this email to mobile-l
>>>>>> and
>>>>>> > wikitech-l to underscore this.
>>>>>> >
>>>>>> >
>>>>>> > > And this may be a silly question, but is there a reasonable means
>>>>>> of
>>>>>> > > approximating how identifying these two data points alone are?
>>>>>> That is,
>>>>>> > > Using a mobile country code and exit IP address, is it possible to
>>>>>> > > identify a particular editor or reader? Or perhaps rephrased, is
>>>>>> this
>>>>>> > data
>>>>>> > > considered anonymized?
>>>>>> > >
>>>>>> >
>>>>>> > Not a silly question. My approximation is these tuples (datetime,
>>>>>> now that
>>>>>> > it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not
>>>>>> perfectly
>>>>>> > anonymized, are low identifying (that is, indirect inferences on
>>>>>> the data
>>>>>> > in isolation are unlikely, but technically possible, through
>>>>>> examination of
>>>>>> > short tail outliers in a cluster analysis where such
>>>>>> readers/editors exist
>>>>>> > in the short tail outliers sets), in contrast to regular web access
>>>>>> logs
>>>>>> > (where direct inferences are easy).
>>>>>> >
>>>>>> > Thanks. I'll forward this along now.
>>>>>> >
>>>>>> > -Adam
>>>>>> > _______________________________________________
>>>>>> > Wikimedia-l mailing list
>>>>>> > Wikimedia-l@lists.wikimedia.org
>>>>>> > Unsubscribe:
>>>>>> https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
>>>>>> > <mailto:wikimedia-l-requ...@lists.wikimedia.org
>>>>>> ?subject=unsubscribe>
>>>>>> _______________________________________________
>>>>>> Wikimedia-l mailing list
>>>>>> Wikimedia-l@lists.wikimedia.org
>>>>>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
>>>>>> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
_______________________________________________
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Reply via email to