Smalyshev created this task.
Smalyshev added projects: Wikidata, Wikidata-Query-Service, Discovery-Wikidata-Query-Service-Sprint.
Herald added a subscriber: Aklapper.
Herald added a project: Discovery.

TASK DESCRIPTION

I've discovered in the RDF database that two references with the same data apparently have different hashes. Specifically, both: <http://www.wikidata.org/reference/004ec6fbee857649acdbdbad4f97b2c8571df97b> and <http://www.wikidata.org/reference/9a24f7c0208b05d6be97077d855671d1dfdbc0dd> have the same data:

<http://www.wikidata.org/prop/reference/P143>	<http://www.wikidata.org/entity/Q48183>

This should not happen. This may indicate change in the hashing calculation (which also should not happen).

Digging deeper into this, I've found that PropertyValueSnak's hash is calculated as:

		return sha1( serialize( $this ) );

I don't think it is a good idea, this does not guarantee property order and contains full names of the classes:

"C:41:\"Wikibase\\DataModel\\Snak\\PropertyValueSnak\":127:{a:2:{i:0;s:4:\"P143\";i:1;C:39:\"Wikibase\\DataModel\\Entity\\EntityIdValue\":50:{C:32:\"Wikibase\\DataModel\\Entity\\ItemId\":6:{Q48183}}}}"

Which means any time any class changes name (even name capitalization) or moves to different namespace, all hashes change. This may be happening to value hashes too.

We need to find a method of generating the IDs that is truly stable and does not change.


TASK DETAIL
https://phabricator.wikimedia.org/T167759

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: daniel, Aklapper, Smalyshev, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, debt, Gehel, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to