Hi Sérgio, thanks for your message. Apologies for the delayed response. Speaking on behalf of the Data Platform Engineering (where the Search Platform team resides and where most of the crucial knowledge for this sort of dataset creation resides), we're not presently considering production of this sort of dataset, as the focus is on different problems. It would be difficult to prioritize this sort of dataset creation and maintenance.
However, could you tell us a bit more here on the list about some of the intended use cases and end users (direct and indirect) for such a dataset? Would you like to be connected with product management to discuss more about your use cases? I wouldn't want to suggest that it means the type of work will be prioritized, but our product management folks are looking for themes in the various use cases as they help set the context for user needs for the roadmap. Thanks! -Adam On Thu, Jul 24, 2025 at 5:57 AM Sérgio Nunes <[email protected]> wrote: > Hi, > > What would be the best Wikimedia interface to try to get this moving? > > Thanks for any sugestions > -- > Sérgio Nunes > > > On Mon, 7 Jul 2025 at 13:23, Sérgio Nunes <[email protected]> wrote: > > > Hi all, > > > > I would like to suggest a new *highly valuable* data dump for Wikipedia: > > the release of aggregated search query logs. I am aware that a previous > > release of search data was retracted due to privacy concerns. However, I > > believe there is a privacy-preserving approach that could still provide > > great value to researchers. > > > > My proposal is to release only aggregated query data—specifically, > queries > > that have been observed more than X times within a given day or week. The > > dataset could follow a simple format such as: > > > > [day or week] [query text] [frequency] > > > > This method would eliminate the risk of exposing personal or unique > search > > queries. The dataset would be especially useful if released regularly > > (e.g., monthly) and broken down by language-specific Wikipedias. > > > > > > Is this the best forum for posting this suggestion? > > > > If you have suggestions for where to direct this proposal, or ideas for > an > > alternative approach, I would be grateful. > > > > Best regards, > > -- > > Sérgio Nunes > > > _______________________________________________ > Wiki-research-l mailing list -- [email protected] > To unsubscribe send an email to [email protected] > _______________________________________________ Wiki-research-l mailing list -- [email protected] To unsubscribe send an email to [email protected]
