https://bugs.documentfoundation.org/show_bug.cgi?id=158067
Bug ID: 158067 Summary: Replace string literals with custom O(U)String literals in code Product: LibreOffice Version: unspecified Hardware: All OS: All Status: UNCONFIRMED Keywords: difficultyBeginner, easyHack, skillCpp, topicCleanup Severity: normal Priority: medium Component: LibreOffice Assignee: libreoffice-bugs@lists.freedesktop.org Reporter: mikekagan...@hotmail.com CC: sberg...@redhat.com Throughout the codebase, there are hundreds of initializations of O(U)Strings with string literals, like OUString foo = "abc"; OString bar("def"); std::vector<OUString> baz = {"xyz1", "xyz2", "xyz3"}; Every time such an initialization appears, a string constructor is called at runtime, which allocates memory, and copies strings. This is because O(U)String is not a plain character array, but a class with a pointer to a structure rtl_(u)String that holds information about reference count, size, and the actual character array. To avoid overhead of such construction, several techniques were introduced over time; with C++17 adoption, we used an updated O(U)StringLiteral, which is a templated structure, with a layout compatible with rtl_(u)String; such O(U)StringLiterals are created at compile time (typically as static inline constexpr objects), and creation of OUStrings from those became a trivial operation. But that required a bloat of such helper objects, which is inconvenient. Instead of a clear for (;;) { // ... OUString foo = "abc"; // Allocating memory in a loop // doing something with the string... } we had static constexpr OUStringLiteral a_abc("abc"); // compile-time constant // ... for (;;) { // ... OUString foo(a_abc); // Trivial construction without allocation // doing something with the string... } Does the trick, but not clean. Introduction of C++20 support in the codebase (commit 1eef07805021b7ca26a1a8894809b6d995747ba1 Bump baseline to C++20, 2023-09-22) pawed a way to use custom literals; and Stephan came with a solution that allows to (mostly) get rid of use of the intermediate O(U)StringLiteral objects, and have the benefir of compile-time creation of the string objects: commit 27d1f3ac016d77d3c907cebedca558308f366855 (O[U]String literals (unusable for now, C++20 only), 2023-07-14) introduced the operator ""_ostr() to both OString and OUString; and commit e83e62fe376a91f7270435e06ee7f6864c48fb4b (Work around MSVC bug with "..."_ostr vs. u"..."_ostr, 2023-07-19) renamed it in OUString into operator ""_ustr(). These operators still use O(U)StringLiterals internally, but hide it from the programmer. Now it is possible to write for (;;) { // ... OUString foo = u"abc"_ustr; // trivial initialization using a compile-time object // doing something with the string... } While the u"abc"_ustr or "abc"_ostr syntax is a bit bulkier than simple "abc", it is inline, is unambiguous, and provides the wanted optimization using much better developer experience than the previous O(U)StringLiteral solution. The easy hack is to replace uses of O(U)StringLiteral in codebase with static constexpr O(U)String, using the user-defined literals. It is possible in most cases; an exception is where the string content is then used at compile time for construction of other constexpr/consteval objects, like in basic/source/classes/sb.cxx, where pCountStr and friends are used later to calculate constexpr hashes. Due to the specifics of rtl_(u)String, where the buffer is declared as a single-character array, but is actually as large as needed to hold all the string, the trick makes compile-time use of the characters in the compile-time-constructed O(U)String an UB. Indeed, the unit tests specifically designed to test O(U)StringLiterals should be kept intact. -- You are receiving this mail because: You are the assignee for the bug.