[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-07-02 Thread leila
leila added a comment.

In T190875#4386342, @JBennett wrote:
Has legal reviewed this?  I don't see any comments from them in this ticket.


Yes, they have. T190874

I'd like to sort out a process for reviewing items like this.  It's sort of in-between security/privacy/data governance.  I'll put together a strawman review process so to help us avoid delays and follow up with Stas.

Please check the approach developed at https://meta.wikimedia.org/wiki/Research:Improving_link_coverage/Release_page_traces in case you want to re-use parts of it. Happy to provide input to what you will put together. :)TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Bawolff, leilaCc: JBennett, Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Wikidata-bugs, aude, Capt_Swing, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-07-02 Thread JBennett
JBennett added a comment.
Has legal reviewed this?  I don't see any comments from them in this ticket.  I'd like to sort out a process for reviewing items like this.  It's sort of in-between security/privacy/data governance.  I'll put together a strawman review process so to help us avoid delays and follow up with Stas.TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Bawolff, JBennettCc: JBennett, Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Wikidata-bugs, aude, Capt_Swing, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-07-02 Thread mkroetzsch
mkroetzsch added a comment.
This is good news -- thanks for the careful review! The lack of specific threat models for this data was also a challenge for us, for similar reasons, but it is also a good sign that many years after the first SPARQL data releases, there is still no realistic danger to user anonymity known. The footer is still a good idea for general community awareness. People who do have concerns about their anonymity could be encouraged to come forward with scenarios that we should take into account.TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Bawolff, mkroetzschCc: JBennett, Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Wikidata-bugs, aude, Capt_Swing, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-06-26 Thread leila
leila added a comment.
@Bawolff @EBjune for context re why Security is asked to provide feedback: For data releases, we usually ask for privacy and security feedback if the data may contain private information (either within itself or in combination with other possible datasets that we, WMF, or others may release in the future.) Sometimes we don't have the capacity or expertise to do this in-house in which case we reach out to external privacy experts (check this example), sometimes we ask internally. Some level of such feedback is needed to understand the risks from the expert perspective before these releases.TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: leilaCc: JBennett, Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Wikidata-bugs, aude, Capt_Swing, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-06-26 Thread EBjune
EBjune added a comment.
Thanks, Brian, I appreciate you taking a look. Maybe next time it would be
good to know all that stuff about security not usually reviewing this kind
of thing up front, it may save a lot of people a bunch of time and
expectations. I also wouldn't be nudging it if Stas wasn't escalating it to
me ;)

Cheers,

Erika

Erika Bjune
Engineering Manager - Search Platform
Wikimedia FoundationTASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: EBjuneCc: JBennett, Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Wikidata-bugs, aude, Capt_Swing, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-06-26 Thread Bawolff
Bawolff added a comment.
Sorry for the delay, this kind of got preempted by t194204 but is now next on my todo list.

As an aside - this sort of thing traditionally doesnt require security team sign off (afaik) nor have we reviewed things like this in the past - historically its been legal and maybe analytics only. As far as I know security team has no criteria for evaluating this sort of thing (beyond ensuring that its not blantently outputting PII) so Im mostly planning to check that it implementsthe properties specified in the description on the linked wikipage. I hope that meets what everyone is looking for.TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BawolffCc: JBennett, Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Wikidata-bugs, aude, Capt_Swing, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-06-13 Thread EBjune
EBjune added a comment.
@Bawolff sorry for the nudge, just wondering if you've had a chance to take a final look at this? Appreciate any update, thanks.TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: EBjuneCc: Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Wikidata-bugs, aude, Capt_Swing, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-06-05 Thread Smalyshev
Smalyshev added a comment.
@Bawolff any news?TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Wikidata-bugs, aude, Capt_Swing, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-05-29 Thread Bawolff
Bawolff added a comment.
Sorry, im on vacation until Monday. Perhaps someone else on the security team can take a look or failing that ill be back on monday

(Thank you for your patience, i know this has been delayed multiple times)TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BawolffCc: Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Wikidata-bugs, aude, Capt_Swing, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-05-29 Thread Smalyshev
Smalyshev added a comment.
@Bawolff ping?TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Wikidata-bugs, aude, Capt_Swing, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-04-26 Thread Smalyshev
Smalyshev added a comment.
@Bawolff do we have any other concerns or this is fine?TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Mpaulson, Wikidata-bugs, aude, Capt_Swing, jayvdb, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-04-26 Thread Adrian_Bielefeldt
Adrian_Bielefeldt added a comment.
The extractAnonymized.py-script is indeed not used anywhere, so I've removed it.

Quick rundown of the anonymization process and its code locations:
Stage 1: Parsing of the query here. This uses a slightly modified version of the OpenRDF-Parser, among others setting the default prefixes.
Stage 2, Point 1: Not exactly in the code, but parsing ignores comments.
Stage 2, Point 2: Replacing the strings is done here.
Stage 2, Point 3: The variable renaming code is here.
Stage 2, Point 4: The geographic coordinates are handled here.
Stage 3: The entire rendering is done here.

The python script is just for convenience, supplying the default locations on the server and building the maven call.TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Adrian_BielefeldtCc: Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Mpaulson, Wikidata-bugs, aude, Capt_Swing, jayvdb, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-04-24 Thread Smalyshev
Smalyshev added a comment.
extractAnonymized.py indeed seems broken, but I don't see anything using it. It seems that anonymization is done by Java class Anonymizer in QueryAnalysis, which is used by Anonymize.py script. Not sure whether it's completely true but I don't see any usage of extractAnonymized.py. @mkroetzsch - could you clarify whether that script is actually used?TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Mpaulson, Wikidata-bugs, aude, Capt_Swing, jayvdb, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-04-04 Thread Bawolff
Bawolff added a comment.
Question: Looking at https://github.com/Wikidata/QueryAnalysis/blob/master/tools/extractAnonymized.py, at first glance, it looks like the string  handline code wouldn't handle edge cases properly e.g.

"foo\"bar"
"foo'bar"

?

(I only skimmed the code, may have misunderstood)

We have a list of query types ordered by frequency. However, there are millions of query types, and the most frequent are those created by bots. I can dig up a pointer to the local file where we have it, if this is what you want. If you are interested in a broader analysis of the data, you could take a look at a recent workshop paper of ours: https://iccl.inf.tu-dresden.de/web/Inproceedings3196/en
It has detailed statistics of SPARQL feature distributions and discusses some findings.

Ok, that's good enough that you did that I think. The main point of the question was to ensure that you had a good idea of the type of data in the data set (i.e. We aren't just releasing data we've never actually looked at)TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BawolffCc: Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Mpaulson, Wikidata-bugs, aude, Capt_Swing, jayvdb, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-04-03 Thread mkroetzsch
mkroetzsch added a comment.
Hi,

The code is here: https://github.com/Wikidata/QueryAnalysis
It was not written for general re-use, so it might be a bit messy in places. The code includes the public Wikidata example queries as test data that can be used without accessing any confidential information.

We have a list of query types ordered by frequency. However, there are millions of query types, and the most frequent are those created by bots. I can dig up a pointer to the local file where we have it, if this is what you want. If you are interested in a broader analysis of the data, you could take a look at a recent workshop paper of ours: https://iccl.inf.tu-dresden.de/web/Inproceedings3196/en
It has detailed statistics of SPARQL feature distributions and discusses some findings.TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mkroetzschCc: Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Mpaulson, Wikidata-bugs, aude, Capt_Swing, jayvdb, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T190875: Security review for Wikidata queries data release proposal

2018-04-02 Thread Bawolff
Bawolff added a comment.
Hi,

So first of all, we'd like to see the code that does the query normalization.

Second, could this have a summary of the types of queries we expect to be most common in the data set. I appreciate there will be a very long tail here, but having a summary of the most common types (broadly speaking) ensures that we have a good understanding of the type of data we expect the data-set to contain.TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BawolffCc: Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Mpaulson, Wikidata-bugs, aude, Capt_Swing, jayvdb, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs