One of Swift’s major advantages as a language is the ease of bridging from 
Swift code to C. This ease makes it possible to utilise the vast body of 
existing code to bootstrap projects, rather than reinventing the world in Swift 
every time we have a problem. 

The String type in Swift has some affordances for this use-case. The 
withCString method, the utf8CString property, and the cString(using:) functions 
are all very effective at providing the most-common case: a NULL-terminated 
string suitable for passing into most libc functions. However, using any of 
these affordances will always incur a memory copy, as Swift needs to not just 
ensure that the bytes making up the String are in contiguous memory, but also 
need to append a NULL byte to those strings for C safety.

This is a bit frustrating when working with C libraries that accept strings in 
the form of pointer + length, and so do not require NULL-termination, such as 
libicu. In these cases we are always required to incur the overhead of a memory 
copy, even in situations when the underlying String representation is 
contiguous, all in the name of appending a NULL byte we don’t actually need. 
Worse, the pointers provided by those methods are not BufferPointers, so they 
don’t carry their length around with them, requiring that another function call 
be used to determine the length of the pointer.

It would be convenient to have one or more additional functions that allow us 
to get access to a contiguous representation of bytes making up the string 
without appending a NULL byte, as a BufferPointer. The guarantees of these 
functions would be:

1. If the underlying string is stored in contiguous memory; AND
2. It is stored in the encoding the user has requested; THEN
3. An UnsafeBufferPointer will be returned that points to the underlying 
storage, without NULL-termination; OTHERWISE
4. A new contiguous buffer will be allocated and the string will be copied into 
it, with no NULL-termination.

Of course, I’ve used the word “return” here, but in practice all of these 
functions would be best used as with* style functions that accept trailing 
non-escaping closures.

The advantage of these functions is that they avoid unnecessary copying of 
memory in circumstances when the internal String representation was already 
suitable for passing to the C library. In the case of libraries like libicu, 
this halves the number of memory accesses in common-cases (e.g. passing a UTF-8 
string), which can provide substantial improvements to both performance and 
memory usage on hot code paths.

Does this seem like it’s of interest to anyone else?

Cory
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to