https://bugs.documentfoundation.org/show_bug.cgi?id=158067

            Bug ID: 158067
           Summary: Replace string literals with custom O(U)String
                    literals in code
           Product: LibreOffice
           Version: unspecified
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Keywords: difficultyBeginner, easyHack, skillCpp, topicCleanup
          Severity: normal
          Priority: medium
         Component: LibreOffice
          Assignee: libreoffice-bugs@lists.freedesktop.org
          Reporter: mikekagan...@hotmail.com
                CC: sberg...@redhat.com

Throughout the codebase, there are hundreds of initializations of O(U)Strings
with string literals, like

    OUString foo = "abc";
    OString bar("def");
    std::vector<OUString> baz = {"xyz1", "xyz2", "xyz3"};

Every time such an initialization appears, a string constructor is called at
runtime, which allocates memory, and copies strings. This is because O(U)String
is not a plain character array, but a class with a pointer to a structure
rtl_(u)String that holds information about reference count, size, and the
actual character array.

To avoid overhead of such construction, several techniques were introduced over
time; with C++17 adoption, we used an updated O(U)StringLiteral, which is a
templated structure, with a layout compatible with rtl_(u)String; such
O(U)StringLiterals are created at compile time (typically as static inline
constexpr objects), and creation of OUStrings from those became a trivial
operation. But that required a bloat of such helper objects, which is
inconvenient. Instead of a clear

    for (;;)
    {
        // ...
        OUString foo = "abc"; // Allocating memory in a loop
        // doing something with the string...
    }

we had 

    static constexpr OUStringLiteral a_abc("abc"); // compile-time constant
    // ...
    for (;;)
    {
        // ...
        OUString foo(a_abc); // Trivial construction without allocation
        // doing something with the string...
    }

Does the trick, but not clean.

Introduction of C++20 support in the codebase (commit
1eef07805021b7ca26a1a8894809b6d995747ba1 Bump baseline to C++20, 2023-09-22)
pawed a way to use custom literals; and Stephan came with a solution that
allows to (mostly) get rid of use of the intermediate O(U)StringLiteral
objects, and have the benefir of compile-time creation of the string objects:
commit 27d1f3ac016d77d3c907cebedca558308f366855 (O[U]String literals (unusable
for now, C++20 only), 2023-07-14) introduced the operator ""_ostr() to both
OString and OUString; and commit e83e62fe376a91f7270435e06ee7f6864c48fb4b (Work
around MSVC bug with "..."_ostr vs. u"..."_ostr, 2023-07-19) renamed it in
OUString into operator ""_ustr(). These operators still use O(U)StringLiterals
internally, but hide it from the programmer.

Now it is possible to write

    for (;;)
    {
        // ...
        OUString foo = u"abc"_ustr; // trivial initialization using a
compile-time object
        // doing something with the string...
    }

While the u"abc"_ustr or "abc"_ostr syntax is a bit bulkier than simple "abc",
it is inline, is unambiguous, and provides the wanted optimization using much
better developer experience than the previous O(U)StringLiteral solution.

The easy hack is to replace uses of O(U)StringLiteral in codebase with static
constexpr O(U)String, using the user-defined literals. It is possible in most
cases; an exception is where the string content is then used at compile time
for construction of other constexpr/consteval objects, like in
basic/source/classes/sb.cxx, where pCountStr and friends are used later to
calculate constexpr hashes. Due to the specifics of rtl_(u)String, where the
buffer is declared as a single-character array, but is actually as large as
needed to hold all the string, the trick makes compile-time use of the
characters in the compile-time-constructed O(U)String an UB.

Indeed, the unit tests specifically designed to test O(U)StringLiterals should
be kept intact.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to