Re: [Wikitech-l] Case Insensitive Database Lookups

2013-10-02 Thread Sean Pringle
On Wed, Oct 2, 2013 at 10:22 AM, Matthew Walker mwal...@wikimedia.orgwrote:


 Given that I want to support case insensitive searching; does anyone have
 any thoughts or examples on how to go about doing it in a binary table?

 The only solution I can think of would be to change the collation/charset
 of the table in question to utf8.


For a single search field there is no need to change the whole table
charset or collation. Simply make the field a character type (VARCHAR /
CHAR / TEXT), utf8 if that's suitable, and choose a case-insenitive
collation. Eg:

ALTER TABLE tbl MODIFY col VARCHAR(N)
 CHARACTER SET utf8 COLLATE utf8_unicode_ci;

More info on string comparison:

http://dev.mysql.com/doc/refman/5.5/en/case-sensitivity.html

However I'd like to hear the historical reasons why MW does everything
binary too. There certainly were some character set issues in older MySQL
versions and afaik we still have to support MySQL 5.0.2 and up.

BR
Sean
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Updating Wikipedia based on Wikidata changes

2013-07-28 Thread Sean Pringle
On Tue, Jul 23, 2013 at 1:42 AM, Denny Vrandečić 
denny.vrande...@wikimedia.de wrote:


 * EntityUsage: one table per client. It has two columns, one with the
 pageId and one with the entityId, indexed on both columns (and one column
 with a pk, I guess, for OSC).


 * Subscriptions: one table on the client. It has two columns, one with the
 pageId and one with the siteId, indexed on both columns (and one column
 with a pk, I guess, for OSC).

 EntityUsage is a potentially big table (something like pagelinks-size).

 On a change on Wikidata, Wikidata consults the Subscriptions table, and
 based on that it dispatches the changes to all clients listed there for a
 given change. Then the client receives the changes and based on the
 EntityUsage table performs the necessary updates.

 We wanted to ask for input on this approach, and if you see problems or
 improvements that we should put in.


Sounds OK to me.

Will (or could) pageId/entityId and pageId/siteId have unique constraints?

BR
Sean
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l