Hi all,

I would like to suggest a new *highly valuable* data dump for Wikipedia:
the release of aggregated search query logs. I am aware that a previous
release of search data was retracted due to privacy concerns. However, I
believe there is a privacy-preserving approach that could still provide
great value to researchers.

My proposal is to release only aggregated query data—specifically, queries
that have been observed more than X times within a given day or week. The
dataset could follow a simple format such as:

[day or week] [query text] [frequency]

This method would eliminate the risk of exposing personal or unique search
queries. The dataset would be especially useful if released regularly
(e.g., monthly) and broken down by language-specific Wikipedias.


Is this the best forum for posting this suggestion?

If you have suggestions for where to direct this proposal, or ideas for an
alternative approach, I would be grateful.

Best regards,
--
Sérgio Nunes
_______________________________________________
Wiki-research-l mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to