On Sat, Feb 04, 2017 at 09:52:47PM -0600, boB Stepp wrote: > Does the list sort() method (and other sort methods in Python) just go > by the hex value assigned to each symbol to determine sort order in > whichever Unicode encoding chart is being implemented?
Correct, except that there is only one Unicode encoding chart. You may be thinking of the legacy Windows "code pages" system, where you can change the code page to re-interpret characters as different characters. E.g. ð in code page 1252 (Western European) becomes π in code page 1253 (Greek). Python supports encoding and decoding to and from legacy code page forms, but Unicode itself does away with the idea of using separate code pages. It effectively is a single, giant code page containing room for over a million characters. It's also a superset of ASCII, so pure ASCII text can be identical in Unicode. Anyhoo, since Unicode supports dozens of languages from all over the world, it defines "collation rules" for sorting text in various languages. For example, sorting in Austria is different from sorting in Germany, despite them both using the same alphabet. Even in English, sorting rules can vary: some phone books sort Mc and Mac together, some don't. However, Python doesn't directly support that. It just provides a single basic lexicographic sort based on the ord() of each character in the string. > If yes, then > my expectation would be that the French "á" would come after the "z" > character. Correct: py> "á" > "z" True py> sorted('áz') ['z', 'á'] -- Steve _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor