srishakatux created this task.
srishakatux added projects: Community-Wishlist-Survey-2016, Wikidata.
Herald added a subscriber: Aklapper.

TASK DESCRIPTION

Proposed in #Community-Wishlist-Survey-2016. Received 38 support votes, and ranked #36 out of 265 proposals. View full proposal with discussion and votes here.

Problem

Importing data with a one-time procedure is good, but we should think about what happens afterwards: we have to keep the data in sync. Many people import from external information sources (Examples: museum web page listing the birth/death of Sri Lanka singers, foreign affairs website listing Luxemburg's embassies, etc) using a self-made combination of scripts+spreadsheets+copy/pasting, then input the results in QuickStatements or similar APIs. Then they forget about it, the scripts stay on their own computers and eventually get deleted, and the next person who wants to update the info from the same website has to start from scratch, and figure out what items have to be created and what items have to be updated and how.

Who would benefit

People who import specialized datasets into Wikidata

Proposed solution

Let's have a platform that facilitates reuse and keeps the data in sync. Rationalize the process, make it less error-prone, more efficient, and more collaborative, by having a Git-backed webapp where people can easily:

  • Propose a new import script (including metadata about copyright) via a pull request. An import script scrapes information from some website and generates a QuickStatements file.
  • Run an existing import script, potentially with a preview screen to check that data has been correctly extracted before injecting it into Wikidata.
  • Metadata is kept about when the data was last synchronized, and when each data element has been updated last, both on the external side and on the Wikidata side.
  • Metadata is kept about exceptions (cases where the external database is wrong, for instance).

All of these modules (except the import scripts) would be the same for all databases, which would help a lot in factorizing efforts, avoiding traps, making sync efficient, preventing contributors from overwriting each other endlessly.

Technical details

Time, expertise and skills required

  • e.g. 2-3 weeks, advanced contributor, _javascript_, css, etc

Suitable for

  • e.g. Hackathon, GSOC, Outreachy, etc.

Proposer

Syced

Related links


TASK DETAIL
https://phabricator.wikimedia.org/T159190

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: srishakatux
Cc: Aklapper, srishakatux, D3r1ck01, Samwilson, Izno, Wikidata-bugs, aude, Mbch331, Jay8g
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to