Andrawaag added a comment.

  You are completely right, the same hashes are not needed to apply 
EntitySchema's on memory ingestion to Wikidata. I need the hashes as a sanity 
check that my script created the exact same RDF as being produced by Wikidata 
natively. So the hashes are only needed in the development phase of the script.
  
  Here is a notebook that contains the first prototype 
<https://public.paws.wmcloud.org/User:Andrawaag/Genewiki/Wikidata_json2ttl.ipynb>.
  
    allRD = WDqidRDFEngine(qid="Q38", fetch_all=True)
    
    compareRDF = Graph()
    compareRDF.parse("http://www.wikidata.org/entity/Q38.ttl";, )
    inboth, left, right = graph_diff(to_isomorphic(compareRDF), 
to_isomorphic(allRD.rdf_item))
    print(len(left))
    print(len(compareRDF)
  
  If my script works, there should be no difference in the length of both 
graphs. Currently, that is not the case. I checked various examples and except 
for the hashes in those normalized statements they seem equal. But if it is 
indeed difficult to reproduce those hashes, I should reflect on another test to 
verify.
  
  In the actual validation script, not all RDF will be needed. Ignoring the 
labels for example slims down the RDF graph substantially. So I am currently 
building functionality into the WikidataIntegrator that allows selecting only 
certain parts (e.g. no truthy statements, or only truthy statements, no 
normalized values, etc). A notebook with that code is here 
<https://public.paws.wmcloud.org/User:Andrawaag/Genewiki/wdi_rdf.ipynb>
  
  My PHP skills are a bit rusty, but I will investigate and or consider other 
test strategies.

TASK DETAIL
  https://phabricator.wikimedia.org/T283997

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore, Andrawaag
Cc: Lucas_Werkmeister_WMDE, Aklapper, Andrawaag, Invadibot, maantietaja, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Addshore, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to