Hi all,

On 15.03.21 02:57, Hydriz Scholz wrote:
I also intend to
integrate a "watchlist" feature that can automatically notify users
when new datasets are available.

Not sure, if this is a killer feature for human users, i.e. mailbox notification. We are using the Wikimedia Dumps since 13 years now for DBpedia and implemented a download function [1]. However, this is not running optimal. I think it still uses the links in the HTML page to find the download URLs.

The way we implemented it is: download (2021-01-01) and then it tries to download the dumps from the beginning of the month and fails if it don't find some and you need to re-run later.

Would be nice to have an API to check for availability and define sets. We are in the progress of open-sourcing databus.dbpedia.org which is a registry offering this functionality for any files, i.e. shasums, downloadUrls, API for querying, machine-readable and actionable licenses, etc. We will put the wikimedia dumps on the bus eventually.

For me/us, we would value the ability to work with them programatically over yet another notification, but others might have different opinions.

-- Sebastian

[1] https://github.com/dbpedia/extraction-framework/blob/a334ac2af877531a082dc9ae218926d29f43b789/dump/src/main/scala/org/dbpedia/extraction/dump/download/Download.scala


_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

Reply via email to