> On 3 Oct 2016, at 19:17, Jean-Denis Muys via swift-users > <swift-users@swift.org> wrote: > > You are right: I don’t know much about asian languages. > > How would you go about counting consonants, vowels (and tone-marks?) in the > most general way?
Iterate over unicodeScalars (in the most general case) - Swift characters are probably ok for European languages. For each unicodeScalar a.k.a codepoint you can use the icu function: int8_t chrTyp = u_charType (codepoint) This returns the general category value for the code point. This gives you something like U_OTHER_PUNCTUATION, U_MATH_SYMBOL, U_OTHER_LETTER etc. See enum UCharCategory in <http://icu-project.org/apiref/icu4c-latest/uchar_8h.html> In European languages ignore U_NON_SPACING_MARKs. There is a compare:options function for NSString (and probably similar for Swift String) which might use the options NSCaseInsensitiveSearch and NSDiacriticInsensitiveSearch to find equality between ‘E’, ‘e’ and è, é, Ĕ etc. That is: for each character (or unicodeScalar) compare to a, e, i, o, u with these options. let str = "HaÁÅǺáXeëẽêèâàZ" for char in str.characters { let vowel = isVowel( char ) print("\(char) is \(vowel ? "vowel" : "consonant")") } func isVowel( _ char: Character ) -> Bool { let s1 = "\(char)" let s2 = s1 as NSString let opt: NSString.CompareOptions = [.diacriticInsensitive, .caseInsensitive] // no idea how do to this with Strings: if s2.compare("a", options: opt) == .orderedSame {return true} if s2.compare("e", options: opt) == .orderedSame {return true} … return false } If you really want to use Thai, then do NOT ignore U_NON_SPACING_MARKs because some vowels are classified thusly. U+0E01 … U+0E2E are consonants, U+0E30 … U+0E39 and U+0E40 … U+0E44 are vowels. But then: ‘อ’ is sometimes a (silent) consonant (อยาก), sometimes a vowel (บอ), sometimes part of a vowel (มือ), sometimes part of a diphthong (เบื่อ). Similar for ย: normal consonant (ยาก), part of vowel (ไทย) or diphthong (เมีย). In the latter case only ม is a consonant, the rest is one single diphthong and ี is a U_NON_SPACING_MARK which really is a vowel. Oh, and don't forget the ligatures ฤ, ฤๅ, ฦ, ฦๅ. These are both a consonant and a vowel. Same for ำ: not a ligature but a vowel + consonant. But to talk about german: What about diphthongs? “neu” has one consonant + one vowel sound (but 2 vowel characters). What if some silly users don’t know how to type umlauts and write “ueber” (instead of correctly “über”). This is really one consonant (+diaeresis). But beware: “aktuell” is definitely not a misspelling of “aktüll” and has two vowels. Gerriet. _______________________________________________ swift-users mailing list swift-users@swift.org https://lists.swift.org/mailman/listinfo/swift-users