Bawolff added a comment.

Question: Looking at https://github.com/Wikidata/QueryAnalysis/blob/master/tools/extractAnonymized.py, at first glance, it looks like the string handline code wouldn't handle edge cases properly e.g.

"foo\"bar"
"foo'bar"

?

(I only skimmed the code, may have misunderstood)

We have a list of query types ordered by frequency. However, there are millions of query types, and the most frequent are those created by bots. I can dig up a pointer to the local file where we have it, if this is what you want. If you are interested in a broader analysis of the data, you could take a look at a recent workshop paper of ours: https://iccl.inf.tu-dresden.de/web/Inproceedings3196/en
It has detailed statistics of SPARQL feature distributions and discusses some findings.

Ok, that's good enough that you did that I think. The main point of the question was to ensure that you had a good idea of the type of data in the data set (i.e. We aren't just releasing data we've never actually looked at)


TASK DETAIL
https://phabricator.wikimedia.org/T190875

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Bawolff
Cc: Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Mpaulson, Wikidata-bugs, aude, Capt_Swing, jayvdb, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to