Hi all, I would like to suggest a new *highly valuable* data dump for Wikipedia: the release of aggregated search query logs. I am aware that a previous release of search data was retracted due to privacy concerns. However, I believe there is a privacy-preserving approach that could still provide great value to researchers.
My proposal is to release only aggregated query data—specifically, queries that have been observed more than X times within a given day or week. The dataset could follow a simple format such as: [day or week] [query text] [frequency] This method would eliminate the risk of exposing personal or unique search queries. The dataset would be especially useful if released regularly (e.g., monthly) and broken down by language-specific Wikipedias. Is this the best forum for posting this suggestion? If you have suggestions for where to direct this proposal, or ideas for an alternative approach, I would be grateful. Best regards, -- Sérgio Nunes _______________________________________________ Wiki-research-l mailing list -- [email protected] To unsubscribe send an email to [email protected]
