Sebotic added a comment.

I calculated these numbers above, they are solely valid for the chemical structure property InChI (P234), based on ~68 million InChI values in the largest public chemistry database PubChem (also valid for other chemical structure properties like canonical and isomeric SMILES). For any other data of Wikidata datatype string/text, I cannot provide numbers, as I lack the distribution of string lengths relevant to other data which should be represented as strings in Wikidata. And as you can see from the distribution above, increasing the limit would only influence representation of the top ~1% of total chemical structure data.

That said, for me, increasing from 400 to 768 is not wort the effort. The reason is that larger biomolecules will not fit anyway (these are currently increasing in relevance), no matter if the limit is 400 or 768, and if we cannot cover these in a comprehensive fashion, trying to tackle it at all does not seem worthwile.




To: Sebotic
Cc: daniel, thiemowmde, EgonWillighagen, Sebotic, Scott_WUaS, Sadads, Pasleim, Aklapper, Lydia_Pintscher, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331
Wikidata-bugs mailing list

Reply via email to