I don't think the idea is that codepage equals language. Rather codepage equals a writing system, which consists of one or more scripts (e.g., 6 scripts for ShiftJIS). As such the codepage is a useful cue in choosing an appropriate font for rendering text. In the RichEdit edit engine, we use a codepage generalization called a CharRep and break Unicode plain text into runs of text each characterized by a particular CharRep. We then bind these runs to appropriate fonts for rendering. There are many additional considerations, so unfortunately this isn't an easy task. But with enough refinements it works quite well.
The bottom line is that if text was generated using a particular codepage it's likely that the creator of that text intended the text to be rendered with a font that supports that codepage. For text tagged with no codepage, we do our best to translate the keyboard language to a CharRep and proceed as above. When neither the keyboard nor codepage info is available, we use a set of heuristics to break the text into CharRep runs. Among the many heuristics used are 1) a string containing Kana is likely to have a Japanese CharRep, and 2) a CJK string that round trips through CHT, CHS, or ShiftJIS may well belong to those CharReps. In particular if a CJK string doesn't round trip through CHT, it's probably not Traditional Chinese. Murray