On Tue, May 12, 2009 at 5:46 PM, Brion Vibber <br...@wikimedia.org> wrote:
> As a general issue we also need to consider managing paging through
> collation-sorted lists, since sort keys for different inputs may produce
> the same result. At the moment I think category lists are paged by
> offset (bad!) but we should ensure this is planned for.

Category lists use Pager, so they're paged by index offsets, not LIMIT
M, N.  Note that they should probably be ordered by (cl_sortkey,
cl_from) or something instead of just (cl_sortkey) -- currently, equal
sortkeys will cause problems.  But Pager doesn't support multi-key
sort right now.

I'm not sure what you mean here, though.  What does "sort keys for
different inputs may produce the same result" mean?  You're just
talking about sort key conflicts?  In that case it seems best to just
disambiguate by whatever's handy, in this case cl_from (which is the
page_id and so not very meaningful).  If it's coming up often enough
to be a problem, the sort keys should be improved!

>> You don't need another column for categorylinks, you can use the
>> existing cl_sortkey, so that should be relatively easy to deploy.  It
>> doesn't help with non-category use cases, of course.
>
> You would if you need to store a processed sort key index that's not in
> the form of displayable characters. (eg, the output of the UCA)

Why?  cl_sortkey isn't ever displayed to the user, so I don't see why
it couldn't contain binary characters.  I guess it's in the URL of
links past the first page, but that's not a huge deal.  Although it is
a definite downside I didn't think of (it's nice to have
manually-editable URLs!).

>>> It would also be possible to use a separate column for the collated
>>> sorting while using MySQL 4.1+'s native collations, if the uniqueness
>>> constraints are a problem, but this is still dependent on rolling out an
>>> upgrade from 4.0.
>>
>> In that case we may as well make it like cl_sortkey and populate it
>> ourselves, surely.
>
> For the unique case of categorylinks yes. For everything else,
> additional columns are not already present.

I was saying that if we were going to make extra columns, we may as
well roll our own sort keys instead of bothering with collations,
since it's not like we'd save a column.  But of course if rolling our
own would mean two extra columns instead of one, that would be a
definite downside.  Still, MySQL's collation support is unlikely to
ever extend to nearly as many languages as we support, and it can't
handle niceties like eliding initial "A" or "The" in English, say.  So
it doesn't seem like as good a solution.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to