thiemowmde added a comment.

Estimated table sizes:

  • wbl_lexemes
    • The latest Item ID is currently Q49977198. Thats 9 bytes.
    • 9 * 3 = 27 bytes per row.
    • 27 * 1 million Lexemes = 26 megabytes.
  • wbl_lemmata
    • Lexeme IDs will be similar to Item IDs, so 9 bytes again.
    • Lets say language codes are 5 bytes on average (e.g. stuff like "en-gb").
    • Lets say lemmas are 15 characters on average (see http://www.ravi.io/language-word-lengths).
    • Lemmas will use multi-byte UTF-8 characters in many cases. I suggest to assume a factor of 4 bytes per character, just to be sure.
    • Lets say a Lexeme does have 2 lemmas on average.
    • ( 9 + 5 + ( 15 * 4 ) ) * 2 * 1 million Lexemes = 141 megabytes.

TASK DETAIL
https://phabricator.wikimedia.org/T187775

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: thiemowmde
Cc: daniel, Lucas_Werkmeister_WMDE, Ladsgroup, WMDE-leszek, thiemowmde, Aklapper, Lydia_Pintscher, Lahi, Gq86, Cinemantique, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Darkdadaah, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to