I’m with you for a C intro API that support taking a non-null terminated string. I often work with API that support efficient parsing by providing pointer to a global buffer + length to report parsed strings.
Without a way to create a Swift string from buffer + length, interop with such API will be difficult for no good reason, as Swift string don’t event have to be null terminated. > Le 30 mars 2017 à 18:35, Félix Cloutier via swift-evolution > <swift-evolution@swift.org> a écrit : > > I don't have much non-nitpick issues that I greatly care about; I'm in favor > of this. > > My only request: it's currently painful to create a String from a fixed-size > C array. For instance, if I have a pointer to a `struct foo { char name[16]; > }` in Swift where the last character doesn't have to be a NUL, it's hard to > create a String from it. Real-world examples of this are Mach-O LC_SEGMENT > and LC_SEGMENT_64 commands. > > The generally-accepted wisdom <http://stackoverflow.com/a/27456220/251153> is > that you take a pointer to the CChar tuple that represents the fixed-size > array, but this still requires the string to be NUL-terminated. What do we > think of an additional init(cString:) overload that takes an > UnsafeBufferPointer and reads up to the first NUL or the end of the buffer, > whichever comes first? > >> Le 30 mars 2017 à 02:48, Brent Royal-Gordon via swift-evolution >> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> a écrit : >> >>> On Mar 29, 2017, at 5:32 PM, Ben Cohen via swift-evolution >>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote: >>> >>> Hi Swift Evolution, >>> >>> Below is a pitch for the first part of the String revision. This covers a >>> number of changes that would allow the basic internals to be overhauled. >>> >>> Online version here: >>> https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md >>> >>> <https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md> >> >> Really great stuff, guys. Thanks for your work on this! >> >>> In order to be able to write extensions accross both String and Substring, >>> a new Unicode protocol to which the two types will conform will be >>> introduced. For the purposes of this proposal, Unicode will be defined as a >>> protocol to be used whenver you would previously extend String. It should >>> be possible to substitute extension Unicode { ... } in Swift 4 wherever >>> extension String { ... } was written in Swift 3, with one exception: any >>> passing of self into an API that takes a concrete String will need to be >>> rewritten as String(self). If Self is a String then this should effectively >>> optimize to a no-op, whereas if Self is a Substring then this will force a >>> copy, helping to avoid the “memory leak” problems described above. >> >> I continue to feel that `Unicode` is the wrong name for this protocol, >> essentially because it sounds like a protocol for, say, a version of Unicode >> or some kind of encoding machinery instead of a Unicode string. I won't >> rehash that argument since I made it already in the manifesto thread, but I >> would like to make a couple new suggestions in this area. >> >> Later on, you note that it would be nice to namespace many of these types: >> >>> Several of the types related to String, such as the encodings, would >>> ideally reside inside a namespace rather than live at the top level of the >>> standard library. The best namespace for this is probably Unicode, but this >>> is also the name of the protocol. At some point if we gain the ability to >>> nest enums and types inside protocols, they should be moved there. Putting >>> them inside String or some other enum namespace is probably not worthwhile >>> in the mean-time. >> >> Perhaps we should use an empty enum to create a `Unicode` namespace and then >> nest the protocol within it via typealias. If we do that, we can consider >> names like `Unicode.Collection` or even `Unicode.String` which would shadow >> existing types if they were top-level. >> >> If not, then given this: >> >>> The exact nature of the protocol – such as which methods should be protocol >>> requirements vs which can be implemented as protocol extensions, are >>> considered implementation details and so not covered in this proposal. >> >> We may simply want to wait to choose a name. As the protocol develops, we >> may discover a theme in its requirements which would suggest a good name. >> For instance, we may realize that the core of what the protocol abstracts is >> grouping code units into characters, which might suggest a name like >> `Characters`, or `Unicode.Characters`, or `CharacterCollection`, or >> what-have-you. >> >> (By the way, I hope that the eventual protocol requirements will be put >> through the review process, if only as an amendment, once they're >> determined.) >> >>> Unicode will conform to BidirectionalCollection. RangeReplaceableCollection >>> conformance will be added directly onto the String and Substring types, as >>> it is possible future Unicode-conforming types might not be >>> range-replaceable (e.g. an immutable type that wraps a const char *). >> >> I'm a little worried about this because it seems to imply that the protocol >> cannot include any mutation operations that aren't in >> `RangeReplaceableCollection`. For instance, it won't be possible to include >> an in-place `applyTransform` method in the protocol. Do you anticipate that >> being an issue? Might it be a good idea to define a parallel `Mutable` or >> `RangeReplaceable` protocol? >> >>> The C string interop methods will be updated to those described here: a >>> single withCString operation and two init(cString:) constructors, one for >>> UTF8 and one for arbitrary encodings. >> >> Sorry if I'm repeating something that was already discussed, but is there a >> reason you don't include a `withCString` variant for arbitrary encodings? It >> seems like an odd asymmetry. >> >>> The standard library currently lacks a Latin1 codec, so a enum Latin1: >>> UnicodeEncoding type will be added. >> >> Nice. I wrote one of those once; I'll enjoy deleting it. >> >>> A new protocol, UnicodeEncoding, will be added to replace the current >>> UnicodeCodec protocol: >>> >>> public enum UnicodeParseResult<T, Index> { >> >> Either `T` should be given a more specific name, or the enum should be given >> a less specific one, becoming `ParseResult` and being oriented towards >> incremental parsing of anything from any kind of collection. >> >>> /// Indicates valid input was recognized. >>> /// >>> /// `resumptionPoint` is the end of the parsed region >>> case valid(T, resumptionPoint: Index) // FIXME: should these be reordered? >> >> No, I think this is the right order. The thing that's valid is the code >> point. >> >>> /// Indicates invalid input was recognized. >>> /// >>> /// `resumptionPoint` is the next position at which to continue parsing >>> after >>> /// the invalid input is repaired. >>> case error(resumptionPoint: Index) >> >> I know this is abbreviated documentation, but I hope the full version >> includes a good usage example demonstrating, among other things, how to >> detect partial characters and defer processing of them instead of rejecting >> them as erroneous. >> >>> /// An encoding for text with UnicodeScalar as a common currency type >>> public protocol UnicodeEncoding { >>> /// The maximum number of code units in an encoded unicode scalar value >>> static var maxLengthOfEncodedScalar: Int { get } >>> >>> /// A type that can represent a single UnicodeScalar as it is encoded in >>> this >>> /// encoding. >>> associatedtype EncodedScalar : EncodedScalarProtocol >> >> There's an `EncodedScalarProtocol`-shaped hole in this proposal. What does >> it do? What are its semantics? How does `EncodedScalar` relate to the old >> `CodeUnit`? >> >>> @discardableResult >>> public static func parseForward<C: Collection>( >>> _ input: C, >>> repairingIllFormedSequences makeRepairs: Bool = true, >>> into output: (EncodedScalar) throws->Void >>> ) rethrows -> (remainder: C.SubSequence, errorCount: Int) >>> >>> @discardableResult >>> public static func parseReverse<C: BidirectionalCollection>( >>> _ input: C, >>> repairingIllFormedSequences makeRepairs: Bool = true, >>> into output: (EncodedScalar) throws->Void >>> ) rethrows -> (remainder: C.SubSequence, errorCount: Int) >>> where C.SubSequence : BidirectionalCollection, >>> C.SubSequence.SubSequence == C.SubSequence, >>> C.SubSequence.Iterator.Element == EncodedScalar.Iterator.Element >>> } >> >> Are there constraints missing on `parseForward`? >> >> What do these do if `makeRepairs` is false? Would it be clearer if we made >> an enum that described the behaviors and changed the label to something like >> `ifIllFormed:`? >> >>> Due to the change in internal implementation, this means that these >>> operations will be O(n) rather than O(1). This is not expected to be a >>> major concern, based on experiences from a similar change made to Java, but >>> projects will be able to work around performance issues without upgrading >>> to Swift 4 by explicitly typing slices as Substring, which will call the >>> Swift 4 variant, and which will be available but not invoked by default in >>> Swift 3 mode. >> >> Will there be a way to make this also work with a real Swift 3 compiler? For >> instance, can you define `typealias Substring = String` in such a way that >> real Swift 3 will parse and use it, but Swift 4 in Swift 3 mode will ignore >> it? >> >>> This proposal does not yet introduce an implicit conversion from Substring >>> to String. The decision on whether to add this will be deferred pending >>> feedback on the initial implementation. The intention is to make a preview >>> toolchain available for feedback, including on whether this implicit >>> conversion is necessary, prior to the release of Swift 4. >> >> This is a sensible approach. >> >> Thank you for developing this into a full proposal. I discussed the plans >> for Swift 4 with a local group of programmers recently, and everyone was >> pleased to hear that `String` would get an overhaul, that the `characters` >> view would be integrated into the string, etc. We even talked a little about >> `Substring` and people thought it was a good idea. This proposal is shaping >> up to impact a lot of people, but in a good way! >> >> -- >> Brent Royal-Gordon >> Architechies >> >> _______________________________________________ >> swift-evolution mailing list >> swift-evolution@swift.org <mailto:swift-evolution@swift.org> >> https://lists.swift.org/mailman/listinfo/swift-evolution > > _______________________________________________ > swift-evolution mailing list > swift-evolution@swift.org > https://lists.swift.org/mailman/listinfo/swift-evolution
_______________________________________________ swift-evolution mailing list swift-evolution@swift.org https://lists.swift.org/mailman/listinfo/swift-evolution