Smalyshev created this task. Smalyshev added projects: Wikidata, Wikidata-Query-Service, Discovery-Wikidata-Query-Service-Sprint. Herald added a subscriber: Aklapper. Herald added a project: Discovery. |
I've discovered in the RDF database that two references with the same data apparently have different hashes. Specifically, both: <http://www.wikidata.org/reference/004ec6fbee857649acdbdbad4f97b2c8571df97b> and <http://www.wikidata.org/reference/9a24f7c0208b05d6be97077d855671d1dfdbc0dd> have the same data:
<http://www.wikidata.org/prop/reference/P143> <http://www.wikidata.org/entity/Q48183>
This should not happen. This may indicate change in the hashing calculation (which also should not happen).
Digging deeper into this, I've found that PropertyValueSnak's hash is calculated as:
return sha1( serialize( $this ) );I don't think it is a good idea, this does not guarantee property order and contains full names of the classes:
"C:41:\"Wikibase\\DataModel\\Snak\\PropertyValueSnak\":127:{a:2:{i:0;s:4:\"P143\";i:1;C:39:\"Wikibase\\DataModel\\Entity\\EntityIdValue\":50:{C:32:\"Wikibase\\DataModel\\Entity\\ItemId\":6:{Q48183}}}}"Which means any time any class changes name (even name capitalization) or moves to different namespace, all hashes change. This may be happening to value hashes too.
We need to find a method of generating the IDs that is truly stable and does not change.
Cc: daniel, Aklapper, Smalyshev, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, debt, Gehel, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs