[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-07-02 Thread leila
leila added a comment. In T190875#4386342, @JBennett wrote: Has legal reviewed this? I don't see any comments from them in this ticket. Yes, they have. T190874 I'd like to sort out a process for reviewing items like this. It's sort of in-between security/privacy/data governance. I'll put

[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-07-02 Thread JBennett
JBennett added a comment. Has legal reviewed this? I don't see any comments from them in this ticket. I'd like to sort out a process for reviewing items like this. It's sort of in-between security/privacy/data governance. I'll put together a strawman review process so to help us avoid delays

[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-07-02 Thread mkroetzsch
mkroetzsch added a comment. This is good news -- thanks for the careful review! The lack of specific threat models for this data was also a challenge for us, for similar reasons, but it is also a good sign that many years after the first SPARQL data releases, there is still no realistic danger to

[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-06-26 Thread leila
leila added a comment. @Bawolff @EBjune for context re why Security is asked to provide feedback: For data releases, we usually ask for privacy and security feedback if the data may contain private information (either within itself or in combination with other possible datasets that we, WMF, or

[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-06-26 Thread EBjune
EBjune added a comment. Thanks, Brian, I appreciate you taking a look. Maybe next time it would be good to know all that stuff about security not usually reviewing this kind of thing up front, it may save a lot of people a bunch of time and expectations. I also wouldn't be nudging it if Stas

[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-06-26 Thread Bawolff
Bawolff added a comment. Sorry for the delay, this kind of got preempted by t194204 but is now next on my todo list. As an aside - this sort of thing traditionally doesnt require security team sign off (afaik) nor have we reviewed things like this in the past - historically its been legal and

[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-06-13 Thread EBjune
EBjune added a comment. @Bawolff sorry for the nudge, just wondering if you've had a chance to take a final look at this? Appreciate any update, thanks.TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:

[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-06-05 Thread Smalyshev
Smalyshev added a comment. @Bawolff any news?TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila,

[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-05-29 Thread Bawolff
Bawolff added a comment. Sorry, im on vacation until Monday. Perhaps someone else on the security team can take a look or failing that ill be back on monday (Thank you for your patience, i know this has been delayed multiple times)TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-05-29 Thread Smalyshev
Smalyshev added a comment. @Bawolff ping?TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi,

[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-04-26 Thread Smalyshev
Smalyshev added a comment. @Bawolff do we have any other concerns or this is fine?TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar,

[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-04-26 Thread Adrian_Bielefeldt
Adrian_Bielefeldt added a comment. The extractAnonymized.py-script is indeed not used anywhere, so I've removed it. Quick rundown of the anonymization process and its code locations: Stage 1: Parsing of the query here. This uses a slightly modified version of the OpenRDF-Parser, among others

[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-04-24 Thread Smalyshev
Smalyshev added a comment. extractAnonymized.py indeed seems broken, but I don't see anything using it. It seems that anonymization is done by Java class Anonymizer in QueryAnalysis, which is used by Anonymize.py script. Not sure whether it's completely true but I don't see any usage of

[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-04-04 Thread Bawolff
Bawolff added a comment. Question: Looking at https://github.com/Wikidata/QueryAnalysis/blob/master/tools/extractAnonymized.py, at first glance, it looks like the string handline code wouldn't handle edge cases properly e.g. "foo\"bar" "foo'bar" ? (I only skimmed the code, may have

[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-04-03 Thread mkroetzsch
mkroetzsch added a comment. Hi, The code is here: https://github.com/Wikidata/QueryAnalysis It was not written for general re-use, so it might be a bit messy in places. The code includes the public Wikidata example queries as test data that can be used without accessing any confidential

[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-04-03 Thread Bawolff
Bawolff added a comment. Hi, So first of all, we'd like to see the code that does the query normalization. Second, could this have a summary of the types of queries we expect to be most common in the data set. I appreciate there will be a very long tail here, but having a summary of the most