Manybubbles added a comment. My savings claims come from loading the first 300,000 entities in the dump. I'm not sure if we'll do better or worse as we get further along. The entities further along are smaller so we'll get more savings from not having to put them in the term dictionary but we'll get less savings because we they won't compress as well because their IDs are higher. But properties will typically compress quite well as their ids are usually quite low.
It also might be worth looking at doing an unsigned integer rather than a signed one. I implemented unsigned storage upstream in https://phabricator.wikimedia.org/T95904 and I suspect that'll be a decent win for us as it doubles the effective range where you can use inline bytes and inline shorts. TASK DETAIL https://phabricator.wikimedia.org/T95906 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Manybubbles Cc: Aklapper, Manybubbles, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, GWicke, daniel, JanZerebecki _______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs