leila added a comment.
In T190875#4386342, @JBennett wrote:
Has legal reviewed this? I don't see any comments from them in this ticket.
Yes, they have. T190874
I'd like to sort out a process for reviewing items like this. It's sort of in-between security/privacy/data governance. I'll put
JBennett added a comment.
Has legal reviewed this? I don't see any comments from them in this ticket. I'd like to sort out a process for reviewing items like this. It's sort of in-between security/privacy/data governance. I'll put together a strawman review process so to help us avoid delays
mkroetzsch added a comment.
This is good news -- thanks for the careful review! The lack of specific threat models for this data was also a challenge for us, for similar reasons, but it is also a good sign that many years after the first SPARQL data releases, there is still no realistic danger to
leila added a comment.
@Bawolff @EBjune for context re why Security is asked to provide feedback: For data releases, we usually ask for privacy and security feedback if the data may contain private information (either within itself or in combination with other possible datasets that we, WMF, or
EBjune added a comment.
Thanks, Brian, I appreciate you taking a look. Maybe next time it would be
good to know all that stuff about security not usually reviewing this kind
of thing up front, it may save a lot of people a bunch of time and
expectations. I also wouldn't be nudging it if Stas
Bawolff added a comment.
Sorry for the delay, this kind of got preempted by t194204 but is now next on my todo list.
As an aside - this sort of thing traditionally doesnt require security team sign off (afaik) nor have we reviewed things like this in the past - historically its been legal and
EBjune added a comment.
@Bawolff sorry for the nudge, just wondering if you've had a chance to take a final look at this? Appreciate any update, thanks.TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:
Smalyshev added a comment.
@Bawolff any news?TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila,
Bawolff added a comment.
Sorry, im on vacation until Monday. Perhaps someone else on the security team can take a look or failing that ill be back on monday
(Thank you for your patience, i know this has been delayed multiple times)TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL
Smalyshev added a comment.
@Bawolff ping?TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi,
Smalyshev added a comment.
@Bawolff do we have any other concerns or this is fine?TASK DETAILhttps://phabricator.wikimedia.org/T190875EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Adrian_Bielefeldt, Bawolff, APalmer_WMF, Aklapper, DarTar,
Adrian_Bielefeldt added a comment.
The extractAnonymized.py-script is indeed not used anywhere, so I've removed it.
Quick rundown of the anonymization process and its code locations:
Stage 1: Parsing of the query here. This uses a slightly modified version of the OpenRDF-Parser, among others
Smalyshev added a comment.
extractAnonymized.py indeed seems broken, but I don't see anything using it. It seems that anonymization is done by Java class Anonymizer in QueryAnalysis, which is used by Anonymize.py script. Not sure whether it's completely true but I don't see any usage of
Bawolff added a comment.
Question: Looking at https://github.com/Wikidata/QueryAnalysis/blob/master/tools/extractAnonymized.py, at first glance, it looks like the string handline code wouldn't handle edge cases properly e.g.
"foo\"bar"
"foo'bar"
?
(I only skimmed the code, may have
mkroetzsch added a comment.
Hi,
The code is here: https://github.com/Wikidata/QueryAnalysis
It was not written for general re-use, so it might be a bit messy in places. The code includes the public Wikidata example queries as test data that can be used without accessing any confidential
Bawolff added a comment.
Hi,
So first of all, we'd like to see the code that does the query normalization.
Second, could this have a summary of the types of queries we expect to be most common in the data set. I appreciate there will be a very long tail here, but having a summary of the most
16 matches
Mail list logo