Manybubbles added a comment.

My savings claims come from loading the first 300,000 entities in the dump.  
I'm not sure if we'll do better or worse as we get further along.  The entities 
further along are smaller so we'll get more savings from not having to put them 
in the term dictionary but we'll get less savings because we they won't 
compress as well because their IDs are higher.  But properties will typically 
compress quite well as their ids are usually quite low.

It also might be worth looking at doing an unsigned integer rather than a 
signed one. I implemented unsigned storage upstream in 
https://phabricator.wikimedia.org/T95904 and I suspect that'll be a decent win 
for us as it doubles the effective range where you can use inline bytes and 
inline shorts.


TASK DETAIL
  https://phabricator.wikimedia.org/T95906

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Manybubbles
Cc: Aklapper, Manybubbles, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, 
GWicke, daniel, JanZerebecki



_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to