Yes - this is totally confusing. CharacterSet and Set<Character> are completely different things with different semantics.
I don't know the history, but is CharacterSet simply to have a Swift equivalent of NSCharacterSet? That seems to be what it is, but since Swift redefined characters in a better way, this should be removed or called something else to avoid confusion. You shouldn't have to qualify what you mean by 'character' in a type name because it diverges from the definition in the rest of the language. On Thu, 29 Sep 2016 at 04:48 Xiaodi Wu via swift-evolution < swift-evolution@swift.org> wrote: > On Wed, Sep 28, 2016 at 10:34 PM, Xiaodi Wu <xiaodi...@gmail.com> wrote: > >> On Wed, Sep 28, 2016 at 10:23 PM, Charles Srstka via swift-evolution < >> swift-evolution@swift.org> wrote: >> >>> On Sep 28, 2016, at 9:57 PM, Erica Sadun via swift-evolution < >>> swift-evolution@swift.org> wrote: >>> >>> >>> D'erp. I missed that. And that's an unambiguous answer. >>> >>> So let me move on to part B of the pitch: I think CharacterSets are >>> broken. >>> >>> Xiaodi Wu: "isn't the problem you're presenting really an argument that >>> the type should be fleshed out to handle characters (grapheme clusters) >>> containing more than one Unicode scalar?" >>> >>> >>> It seems that it already does handle such characters: >>> >>> (done in Objective-C so we can log the length of the range as a count of >>> UTF-16 code units) >>> >>> #import <Foundation/Foundation.h> >>> >>> int main(int argc, char *argv[]) { >>> @autoreleasepool { >>> NSCharacterSet *bikeSet = [NSCharacterSet >>> characterSetWithCharactersInString:@"🚲"]; >>> NSString *str = @"foo🚲bar"; >>> >>> >>> NSRange range = [str rangeOfCharacterFromSet:bikeSet]; >>> >>> >>> NSLog(@"location: %lu length: %lu", range.location, range.length >>> ); >>> } >>> } >>> >>> - - - - - - - >>> >>> *2016-09-28 22:20:00.622471 test[15577:2433912] location: 3 length: 2* >>> *Program ended with exit code: 0* >>> >>> - - - - - - - >>> >>> As we can see, the character from the set is recognized as consisting of >>> two code units. There are a few bugs in the system, though. See the >>> cocoa-dev thread “Where is my bicycle?” from about a year ago: >>> http://prod.lists.apple.com/archives/cocoa-dev/2015/Apr/msg00074.html >>> >> >> The bike emoji might be two code units, but it is one Unicode scalar >> (U+1F6B2). However, the Canadian flag emoji, for instance, is two Unicode >> scalars (U+1F1E8 U+1F1E6) but nonetheless one character. >> > > To illustrate in code how CharacterSet doesn't actually handle characters > made up of multiple Unicode scalars: > > ``` > import Foundation > > let str1 = "🇦🇩" > let first = CharacterSet(charactersIn: str1) // this actually crashes > corelibs-foundation > let str2 = "🇦🇺" > let second = CharacterSet(charactersIn: str2) > let intersection = first.intersection(second) > print(intersection.isEmpty) > // actual output: false > // obviously, if we were really dealing with characters, the intersection > should be empty > ``` > > >> Charles >>> >>> >>> _______________________________________________ >>> swift-evolution mailing list >>> swift-evolution@swift.org >>> https://lists.swift.org/mailman/listinfo/swift-evolution >>> >>> >> _______________________________________________ > swift-evolution mailing list > swift-evolution@swift.org > https://lists.swift.org/mailman/listinfo/swift-evolution >
_______________________________________________ swift-evolution mailing list swift-evolution@swift.org https://lists.swift.org/mailman/listinfo/swift-evolution