Great idea!

Anyone on the list know if there's a way to make the debug log facilities
do the YYYYMMDD timestamp instead of the longer one?

If not, I suppose we could work to update the core MediaWiki code. [1]

-Adam

1. For those with PHP skills or equivalent, I'm referring to
https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba64612ce79b701a13949e/includes%2FGlobalFunctions.php#L1042.
Scroll to the bottom of the function definition to see the datetimestamp
approach.


On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray <andrew.g...@dunelm.org.uk>wrote:

> Hi Adam,
>
> One thought: you don't really need the date/time data at any detailed
> resolution, do you? If what you're wanting it for is to track major
> changes ("last month it all switched to this IP") and to purge old
> data ("delete anything older than 10 March"), you could simply log day
> rather than datetime.
>
> enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45
>
> enwiki / 127.0.0.1 / 123.45 / 2014-04-16
>
> - the latter gives you the data you need while making it a lot harder
> to do any kind of close user-identification.
>
> Andrew.
> On 16 Apr 2014 19:17, "Adam Baso" <ab...@wikimedia.org> wrote:
>
> > Inline.
> >
> > Thanks for starting this thread.
> > >
> > > Sorry if I've overlooked this, but who/what will have access to this
> > data?
> > > Only members of the mobile team? Local project CheckUsers? Wikimedia
> > > Foundation-approved researchers? Wikimedia shell users? AbuseFilter
> > > filters?
> > >
> >
> > It's a good question. The thought is to put it in the customary
> wfDebugLog
> > location (with, for example, filename "mccmnc.log") on fluorine.
> >
> > It just occurred to me that the wiki name (e.g., "enwiki"), but not the
> > full URL, gets logged additionally as part of the wfDebugLog call; to
> make
> > the implicit explicit, wfDebugLog adds a datetime stamp as well, and
> that's
> > useful for purging old records. I'll forward this email to mobile-l and
> > wikitech-l to underscore this.
> >
> >
> > > And this may be a silly question, but is there a reasonable means of
> > > approximating how identifying these two data points alone are? That is,
> > > Using a mobile country code and exit IP address, is it possible to
> > > identify a particular editor or reader? Or perhaps rephrased, is this
> > data
> > > considered anonymized?
> > >
> >
> > Not a silly question. My approximation is these tuples (datetime, now
> that
> > it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not perfectly
> > anonymized, are low identifying (that is, indirect inferences on the data
> > in isolation are unlikely, but technically possible, through examination
> of
> > short tail outliers in a cluster analysis where such readers/editors
> exist
> > in the short tail outliers sets), in contrast to regular web access logs
> > (where direct inferences are easy).
> >
> > Thanks. I'll forward this along now.
> >
> > -Adam
> > _______________________________________________
> > Wikimedia-l mailing list
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
> _______________________________________________
> Wikimedia-l mailing list
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
>
_______________________________________________
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Reply via email to