MZMcBride added a comment.

> Today I discovered that our `ips_site_page` field is to small to store all 
> page titles out there (eg. https://www.wikidata.org/wiki/Q6703647 is affected 
> by that). The field is defined as `varbinary(255) NOT NULL` which would be 
> enough for the unqualified page titles (page_title is varbinary(255) 
> everywhere), but it's not large enough to hold the page titles plus their 
> namespaces (that's how we store the titles, unlike the `page` table which has 
> a separate int field for the namespace).


Is it a good idea for the `ips_site_page` field to store the namespace name as 
a string? Why is it doing that?

> To make that field fit for all pages on every Wikimedia wiki we should go for 
> a length of at very least 300, as the longest page title I could find has 255 
> chars and the longest non-talk namespace has a length of 39 (The longest talk 
> namespace has a length of 67!), also we need to store the colon. I'd suggest 
> to go for a bit more than that, just to play it safe.


`page.page_title` is 255 bytes today, but I'm not sure it's a great idea to 
make other fields dependent on its length. Matching its length might be more 
reasonable. While I understand smaller fields are better for performance, I'd 
like if `page.page_title` and similar fields followed the model now being used 
with the *_comment fields (cf. https://phabricator.wikimedia.org/T6715). That 
is, have a higher limit on the database side and enforce the limit at the 
application level.

From my reading of the numbers here, 300 bytes wouldn't be sufficient in the 
current most pathological case (255 bytes + 67 bytes).


TASK DETAIL
  https://phabricator.wikimedia.org/T99459

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: MZMcBride
Cc: MZMcBride, Krenair, jcrespo, Springle, Lydia_Pintscher, aude, daniel, 
Aklapper, hoo, Wikidata-bugs



_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to