> On Dec 20, 2015, at 1:01 PM, Daniel Dunbar <[email protected]> wrote:
>
> Hi Nadav,
Hi Daniel,
>
> One thing you didn't call out here was any connection to visibility.
This is a great point!
Apps usually don’t export too many symbols, but they do import symbols from
external shared objects and need to keep the long names around. I created a
basic Swift one-page iOS App using the Xcode wizard and the generated binary
had a string-table of 70K**. The same app in Obj-C had a 12K linkedit section
("size -m" can print this info).
>
> I would expect that in practice it is quite common for most of the symbols to
> not be visible outside the linkage unit. Inside the linkage unit, the
> compiler can already rename many of the symbols in any way it likes. Wouldn't
> one tact for solving this problem simply be to encourage more pervasive use
> of whole module optimization, and support compiler renaming of non-exported
> symbols?
I don’t think that Swift Access Control (public/internal/private) is affected
by WMO, but I will check.
Thanks,
Nadav
** This is one of the names that the sample app has:
__TTSf4n_s_s_n___TTSg5GVSs12_ArrayBufferPSs9AnyObject__GS_PS0___Ss16_ArrayBufferTypeSs_GVSs15CollectionOfOnePS0___GS2_PS0___Ss14CollectionTypeSs_PS0___GVSs17IndexingGeneratorGS_PS0____GS4_GS_PS0____Ss13GeneratorTypeSs_PS0___PS0___GVSs12_SliceBufferPS0___GS6_PS0___Ss23_CollectionDefaultsTypeSsGS6_PS0___Ss32_CollectionGeneratorDefaultsTypeSs_GS4_GS6_PS0____GS4_GS6_PS0____S5_Ss_PS0___SiSiSs16ForwardIndexTypeSs_SiSiSs18_SignedIntegerTypeSs_SiSiSs33_BuiltinIntegerLiteralConvertibleSs_Si_PS0___GVSs14GeneratorOfOnePS0___GS12_PS0___S5_Ss_OSs3BitS13_S9_Ss_SiSiS10_Ss_SiSiS11_Ss_VSs20_DisabledRangeIndex__PS0___GVSs12_prext_SliceGS2_PS0____GS15_GS2_PS0____S7_SsGS15_GS2_PS0____S8_Ss_GS4_GS15_GS2_PS0_____GS4_GS15_GS2_PS0_____S5_Ss_PS0___S13_S13_S9_Ss_SiSiS10_Ss_SiSiS11_Ss_S14__PS0_____TFSs28_arrayNonSliceInPlaceReplaceu0_Rq_Ss16_ArrayBufferTypeq0_Ss14CollectionTypezqq_S_7Elementqqq0_Ss12SequenceType9GeneratorSs13GeneratorType7Elementzqq_Ss23_CollectionDefaultsType5IndexSizqqq_S3_5IndexSs16ForwardIndexType8DistanceSizqqq_S3_5IndexS4_19_DisabledRangeIndexSizqqqq_S3_5IndexS4_8DistanceSs25IntegerLiteralConvertible18IntegerLiteralTypeSi_FTRq_GVSs5RangeSi_Siq0__T_
>
> - Daniel
>
>> On Dec 18, 2015, at 4:42 PM, Nadav Rotem via swift-dev <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Hi Swift-Dev,
>>
>> We care deeply about the size of binaries that are generated by the Swift
>> compiler and make an effort to optimize and shrink the generated binaries.
>> One of the problems that we have today is that swift symbols are mangled
>> into extremely long strings. This is especially a problem for libraries, and
>> almost half of the size of libswiftCore.dylib (the swift runtime library on
>> x86_64 OSX) is string tables. On MacOSX you can use the command ’size -m
>> file.dylib’ to read the size of the string table. C++ also suffers from the
>> problem of long names, but since we control the Swift ABI we can do better
>> than C++.
>>
>> Here are two names that I found in our libraries:
>>
>>
>> __TIF14StdlibUnittest13checkSequenceu0_Rxs14CollectionType_s12SequenceTypeWx9Generator7Element_zW_9GeneratorS3__rFTxq_KT_SS9showFrameSb10stackTraceVS_14SourceLocStack4fileSS4lineSu16resiliencyChecksVS_32CollectionMisuseResiliencyChecks9sameValueFTWxS2_S3__WxS2_S3___Sb_T_A6_
>>
>>
>> __TTSg5VSS13CharacterViewS_s14CollectionTypes_GVs17IndexingGeneratorS__GS1_S__s13GeneratorTypes_VS_5IndexS3_s16ForwardIndexTypes_SiSis18_SignedIntegerTypes_SiSis33_BuiltinIntegerLiteralConvertibles_Vs20_DisabledRangeIndex__S_S_s9IndexablesS_s12SequenceTypes_GS1_S__GS1_S__S2_s_Vs9Character_S3_S3_S4_s_SiSiS5_s_SiSiS6_s_S7__S__S10__S10____TFFSS12replaceRangeuRxs14CollectionTypeWx9Generator7Element_zVs9CharacterrFTGVs5RangeVVSS13CharacterView5Index_4withx_T_U_FRS4_T_
>>
>> One way to solve this problem is to implement a better string table format
>> that will include some kind of compression or tree encoding into MachO, ELF
>> and PE. The work on MachO, ELF and PE is beyond the scope of the Swift
>> project and I’d like to focus on improving things on the Swift side.
>>
>> I see two independent tasks that we can do to shorten the length of string
>> symbols. First, we can improve the existing mangling. We can do things like
>> use non-decimal length specifiers, etc. The second task, which is the
>> subject of the rest of this email, is the implementation of a compression
>> layer on top of the existing mangling scheme.
>>
>> Compressing long symbols
>>
>> The Swift compiler is free to encode Swift names in whatever way it likes,
>> but we need to follow a few basic rules. First, the character encoding space
>> needs to be [_a-zA-Z0-9]. We can’t use non-ascii characters to encode Swift
>> names because this is the character space that ELF supports. Second, names
>> need to be unique and independent. We can’t just compress the whole string
>> table or create names that are a running counter because we need to be able
>> to inspect each name independently. We need to be able to decode the name
>> independently and extract the information that’s stored in the name. We
>> also need to be able to decode the name to be able to present something
>> useful in the debugger. This means that a simple one directional hashing of
>> the name is not an option.
>>
>> With these restrictions in mind, I believe that the problem that we need to
>> solve is independent compression of names. I suggest that we add a
>> post-processing string compression pass. The input to the compress method
>> would be a string mangled name, like the two names I showed above, and the
>> output would be a shorter string. We would also need a decompression method
>> that would return the original string.
>>
>> Obviously, gzipping each string independently won’t be effective. I believe
>> that we need to implement our own compression algorithm here that will work
>> with the restrictions that I mentioned above (being textual) and with the
>> prior knowledge we have.
>>
>>
>> Example of a basic compression algorithm
>>
>> In this section I describe a trivial compression algorithm that can reduce
>> the size of the string tables by 30%. This algorithm is not optimal but it
>> is a good starting point for our discussion. One example of "prior
>> knowledge” that I mentioned above is that we know what are the common
>> sub-strings that are used in Swift names. Some of the most common
>> substrings in Swift mangled names are:
>>
>> 1. "S_S_S_S"
>> 2. "ollectio”
>> 3. “Type”
>> 4. “Index”
>> 5. “erator”
>> 6. “7Element"
>> 7. “able"
>>
>> We can use this prior knowledge about names of Swift types to compress our
>> mangling! Consider the following encoding:
>>
>> A string like this:
>>
>> __TTSg5VSS13CharacterView
>>
>> Would be translated into something like:
>>
>> __TTSg5<reference to word #67>13<reference to word #43>View
>>
>> Commonly used strings can be encoded as references to global tables. We need
>> to have some escape character (that needs to be ascii-printable), and use
>> that character to encode the reference.
>> One interesting bit of information is that the character ‘Y’ is only used 4
>> times in the entire standard library! The letter J, and a few other letters
>> are also not used very frequently. We can use these letters as escape
>> characters.
>>
>> The basic encoding that I experimented with used the following rules:
>>
>> 1. Use two escape characters that are not frequently used in names (Y and
>> Z). These characters are escape character and can’t be used without
>> escaping. ‘Y’ will be encoded as ‘YY’, and ‘Z’ would be encoded as ‘YZ’.
>>
>> 2. The most commonly used sub-strings (calculated as length of substring
>> times number of occurrences) would be encoded with a single escape character
>> and a single index character. The table of highly repetitive substrings can
>> only contain 61 entries (a-zA-Z0-9_, minus two escape characters).
>>
>> A reference to the very frequent substrings would be encoded as “Yx”, where
>> x is the character that’s translated into a numeric index. This is two chars
>> per substring.
>>
>> 3. The less commonly used substrings are encoded as three-character
>> sequence. “Zxx”, where xx is the numeric index in the large table (that can
>> hold 63*63 substrings).
>>
>> It is obvious how to reverse this compression using the same string table
>> used to compress the names.
>>
>>
>> With this encoding scheme the name
>> “__TwxxV14StdlibUnittest24MinimalForwardCollection” becomes
>> “Y5wxxJ0UJ1HJ3_Jyn4J5VJ6SdY9on” and the name “__TMPVs15ContiguousArray”
>> becomes “Y5MPJ3r5J5EJ82”.
>>
>> I benchmarked this basic technique by using 5 dylibs as the training set
>> (SwiftCore, Simd, unittests, etc) and compressed the string tables of these
>> binaries. The result was a ~31% reduction in size.
>>
>> <PastedGraphic-2.png>
>>
>>
>> On the Swift dylib itself (training set and input to the compressor) the
>> wins are about ~25%.
>>
>> This encoding makes the symbol name a lot less intuitive for humans, but at
>> the SIL-level we already print human readable strings above function names
>> as comments. I believe that the debugger would just work as-is since the
>> internal APIs won’t change.
>>
>> One of the disadvantages of using a constant string table is that once we
>> rename the APIs in the standard library the string table would be less
>> effective in compressing it and code that uses it.
>>
>> What’s next?
>>
>> The small experiment I described above showed that compressing the names in
>> the string table has a huge potential for reducing the size of swift
>> binaries. I’d like for us (swift-developers) to talk about the implications
>> of this change and start working on the two tasks of tightening our existing
>> mangling format and on implementing a new compression layer on top.
>>
>>
>> Nadav
>>
>> _______________________________________________
>> swift-dev mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.swift.org/mailman/listinfo/swift-dev
>
_______________________________________________
swift-dev mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-dev