thiemowmde added a comment. |
Estimated table sizes:
- wbl_lexemes
- The latest Item ID is currently Q49977198. Thats 9 bytes.
- 9 * 3 = 27 bytes per row.
- 27 * 1 million Lexemes = 26 megabytes.
- wbl_lemmata
- Lexeme IDs will be similar to Item IDs, so 9 bytes again.
- Lets say language codes are 5 bytes on average (e.g. stuff like "en-gb").
- Lets say lemmas are 15 characters on average (see http://www.ravi.io/language-word-lengths).
- Lemmas will use multi-byte UTF-8 characters in many cases. I suggest to assume a factor of 4 bytes per character, just to be sure.
- Lets say a Lexeme does have 2 lemmas on average.
- ( 9 + 5 + ( 15 * 4 ) ) * 2 * 1 million Lexemes = 141 megabytes.
TASK DETAIL
EMAIL PREFERENCES
To: thiemowmde
Cc: daniel, Lucas_Werkmeister_WMDE, Ladsgroup, WMDE-leszek, thiemowmde, Aklapper, Lydia_Pintscher, Lahi, Gq86, Cinemantique, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Darkdadaah, Mbch331
Cc: daniel, Lucas_Werkmeister_WMDE, Ladsgroup, WMDE-leszek, thiemowmde, Aklapper, Lydia_Pintscher, Lahi, Gq86, Cinemantique, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Darkdadaah, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs