Title: [277234] trunk/Source/WTF
Revision
277234
Author
da...@apple.com
Date
2021-05-08 17:59:28 -0700 (Sat, 08 May 2021)

Log Message

Rename toIntegralType to parseInteger and prepare to use it to replace all our integer-parsing functions
https://bugs.webkit.org/show_bug.cgi?id=225560

Reviewed by Sam Weinig.

I have a much larger patch that replaces all the many toInt functions with these parseInteger
function templates. The parseInteger ones take StringView, which means we can pass almost any
type of string or character/length pair including a substring without allocating a new string,
return Optional rather than using an out argument to report failure, take the integer type as
a template parameter so can be used for precisely the desired type at each call site, and make
the "allow trailing junk" feature explicit, rather than implicitly including it in the shortest
named functions, which I hope will discourage people from using that junk-ignoring mode
when it's not desirable.

Also includes adoption of parseInteger within WTF itself, outside the string classes.

My plan is to land the rest of adoption of this in chunks so we can review carefully and spot
mistakes as we go. Then return to files like WTFString.h and remove the many old functions
this replaces, including the String::toInt and charactersToInt families of functions.

* wtf/URL.cpp:
(WTF::URL::port const): Use parseInteger<uint16_t>.
(WTF::URL::setHostAndPort): Ditto.

* wtf/text/StringToIntegerConversion.h: Refactored the existing code to tighten things up
a bit, and got rid of overloads that take pointer and length and the the generic "any string
type" feature, since StringView already works well for that. Renamed toIntegralType to
parseInteger, leaving the old name behind to ease the transition while adopting parseInteger.

* wtf/text/WTFString.cpp:
(WTF::toDoubleType): Use the TrailingJunkPolicy enumeration from the
StringToIntegerConversion.h header.
(WTF::charactersToDouble): Ditto.
(WTF::charactersToFloat): Ditto.

Modified Paths

Diff

Modified: trunk/Source/WTF/ChangeLog (277233 => 277234)


--- trunk/Source/WTF/ChangeLog	2021-05-09 00:51:46 UTC (rev 277233)
+++ trunk/Source/WTF/ChangeLog	2021-05-09 00:59:28 UTC (rev 277234)
@@ -1,3 +1,40 @@
+2021-05-08  Darin Adler  <da...@apple.com>
+
+        Rename toIntegralType to parseInteger and prepare to use it to replace all our integer-parsing functions
+        https://bugs.webkit.org/show_bug.cgi?id=225560
+
+        Reviewed by Sam Weinig.
+
+        I have a much larger patch that replaces all the many toInt functions with these parseInteger
+        function templates. The parseInteger ones take StringView, which means we can pass almost any
+        type of string or character/length pair including a substring without allocating a new string,
+        return Optional rather than using an out argument to report failure, take the integer type as
+        a template parameter so can be used for precisely the desired type at each call site, and make
+        the "allow trailing junk" feature explicit, rather than implicitly including it in the shortest
+        named functions, which I hope will discourage people from using that junk-ignoring mode
+        when it's not desirable.
+
+        Also includes adoption of parseInteger within WTF itself, outside the string classes.
+
+        My plan is to land the rest of adoption of this in chunks so we can review carefully and spot
+        mistakes as we go. Then return to files like WTFString.h and remove the many old functions
+        this replaces, including the String::toInt and charactersToInt families of functions.
+
+        * wtf/URL.cpp:
+        (WTF::URL::port const): Use parseInteger<uint16_t>.
+        (WTF::URL::setHostAndPort): Ditto.
+
+        * wtf/text/StringToIntegerConversion.h: Refactored the existing code to tighten things up
+        a bit, and got rid of overloads that take pointer and length and the the generic "any string
+        type" feature, since StringView already works well for that. Renamed toIntegralType to
+        parseInteger, leaving the old name behind to ease the transition while adopting parseInteger.
+
+        * wtf/text/WTFString.cpp:
+        (WTF::toDoubleType): Use the TrailingJunkPolicy enumeration from the
+        StringToIntegerConversion.h header.
+        (WTF::charactersToDouble): Ditto.
+        (WTF::charactersToFloat): Ditto.
+
 2021-05-08  Chris Dumez  <cdu...@apple.com>
 
         Port Filesystem::pathByAppendingComponent() & Filesystem:: pathByAppendingComponents() to std::filesystem

Modified: trunk/Source/WTF/wtf/URL.cpp (277233 => 277234)


--- trunk/Source/WTF/wtf/URL.cpp	2021-05-09 00:51:46 UTC (rev 277233)
+++ trunk/Source/WTF/wtf/URL.cpp	2021-05-09 00:59:28 UTC (rev 277234)
@@ -38,6 +38,7 @@
 #include <wtf/text/CString.h>
 #include <wtf/text/StringBuilder.h>
 #include <wtf/text/StringHash.h>
+#include <wtf/text/StringToIntegerConversion.h>
 #include <wtf/text/TextStream.h>
 
 namespace WTF {
@@ -133,7 +134,7 @@
 
 Optional<uint16_t> URL::port() const
 {
-    return m_portLength ? parseUInt16(StringView(m_string).substring(m_hostEnd + 1, m_portLength - 1)) : WTF::nullopt;
+    return m_portLength ? parseInteger<uint16_t>(StringView(m_string).substring(m_hostEnd + 1, m_portLength - 1)) : WTF::nullopt;
 }
 
 String URL::hostAndPort() const
@@ -503,7 +504,7 @@
         // Multiple colons are acceptable only in case of IPv6.
         if (hostName.contains(':') && !hostName.startsWith('['))
             return;
-        if (!parseUInt16(portString))
+        if (!parseInteger<uint16_t>(portString))
             portString = { };
     }
     if (hostName.isEmpty()) {

Modified: trunk/Source/WTF/wtf/text/StringToIntegerConversion.h (277233 => 277234)


--- trunk/Source/WTF/wtf/text/StringToIntegerConversion.h	2021-05-09 00:51:46 UTC (rev 277233)
+++ trunk/Source/WTF/wtf/text/StringToIntegerConversion.h	2021-05-09 00:59:28 UTC (rev 277234)
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2018-2019 Apple Inc. All rights reserved.
+ * Copyright (C) 2018-2021 Apple Inc. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
@@ -25,46 +25,35 @@
 
 #pragma once
 
-#include <unicode/utypes.h>
-#include <wtf/ASCIICType.h>
 #include <wtf/CheckedArithmetic.h>
+#include <wtf/text/StringView.h>
 
 namespace WTF {
 
-inline bool isCharacterAllowedInBase(UChar c, int base)
-{
-    if (c > 0x7F)
-        return false;
-    if (isASCIIDigit(c))
-        return c - '0' < base;
-    if (isASCIIAlpha(c)) {
-        if (base > 36)
-            base = 36;
-        return (c >= 'a' && c < 'a' + base - 10)
-            || (c >= 'A' && c < 'A' + base - 10);
-    }
-    return false;
-}
+// The parseInteger function template allows leading and trailing spaces as defined by isASCIISpace, and, after the leading spaces, allows a single leading "+".
+// The parseIntegerAllowingTrailingJunk function template is like parseInteger, but allows any characters after the integer.
 
-template<typename IntegralType, typename CharacterType>
-inline IntegralType toIntegralType(const CharacterType* data, size_t length, bool* ok, int base = 10)
-{
-    static constexpr bool isSigned = std::numeric_limits<IntegralType>::is_signed;
+// FIXME: Should we add a version that does not allow "+"?
+// FIXME: Should we add a version that allows other definitions of spaces, like isHTMLSpace or isHTTPSpace?
+// FIXME: Should we add a version that does not allow leading and trailing spaces?
 
-    Checked<IntegralType, RecordOverflow> value;
-    bool isOk = false;
-    bool isNegative = false;
+template<typename IntegralType> Optional<IntegralType> parseInteger(StringView, uint8_t base = 10);
+template<typename IntegralType> Optional<IntegralType> parseIntegerAllowingTrailingJunk(StringView, uint8_t base = 10);
 
+enum class TrailingJunkPolicy { Disallow, Allow };
+
+template<typename IntegralType, typename CharacterType> Optional<IntegralType> parseInteger(const CharacterType* data, size_t length, uint8_t base, TrailingJunkPolicy policy)
+{
     if (!data)
-        goto bye;
+        return WTF::nullopt;
 
-    // skip leading whitespace
-    while (length && isSpaceOrNewline(*data)) {
+    while (length && isASCIISpace(*data)) {
         --length;
         ++data;
     }
 
-    if (isSigned && length && *data == '-') {
+    bool isNegative = false;
+    if (std::is_signed_v<IntegralType> && length && *data == '-') {
         --length;
         ++data;
         isNegative = true;
@@ -73,73 +62,83 @@
         ++data;
     }
 
-    if (!length || !isCharacterAllowedInBase(*data, base))
-        goto bye;
+    auto isCharacterAllowedInBase = [] (auto character, auto base) {
+        if (isASCIIDigit(character))
+            return character - '0' < base;
+        return toASCIILowerUnchecked(character) >= 'a' && toASCIILowerUnchecked(character) < 'a' + std::min(base - 10, 26);
+    };
 
-    while (length && isCharacterAllowedInBase(*data, base)) {
-        --length;
-        IntegralType digitValue;
-        auto c = *data;
-        if (isASCIIDigit(c))
-            digitValue = c - '0';
-        else if (c >= 'a')
-            digitValue = c - 'a' + 10;
-        else
-            digitValue = c - 'A' + 10;
+    if (!(length && isCharacterAllowedInBase(*data, base)))
+        return WTF::nullopt;
 
+    Checked<IntegralType, RecordOverflow> value;
+    do {
+        IntegralType digitValue = isASCIIDigit(*data) ? *data - '0' : toASCIILowerUnchecked(*data) - 'a' + 10;
         value *= static_cast<IntegralType>(base);
         if (isNegative)
             value -= digitValue;
         else
             value += digitValue;
-        ++data;
-    }
+    } while (--length && isCharacterAllowedInBase(*++data, base));
 
     if (UNLIKELY(value.hasOverflowed()))
-        goto bye;
+        return WTF::nullopt;
 
-    // skip trailing space
-    while (length && isSpaceOrNewline(*data)) {
-        --length;
-        ++data;
+    if (policy == TrailingJunkPolicy::Disallow) {
+        while (length && isASCIISpace(*data)) {
+            --length;
+            ++data;
+        }
+        if (length)
+            return WTF::nullopt;
     }
 
-    if (!length)
-        isOk = true;
-bye:
+    return value.unsafeGet();
+}
+
+template<typename IntegralType> Optional<IntegralType> parseInteger(StringView string, uint8_t base)
+{
+    if (string.is8Bit())
+        return parseInteger<IntegralType>(string.characters8(), string.length(), base, TrailingJunkPolicy::Disallow);
+    return parseInteger<IntegralType>(string.characters16(), string.length(), base, TrailingJunkPolicy::Disallow);
+}
+
+template<typename IntegralType> Optional<IntegralType> parseIntegerAllowingTrailingJunk(StringView string, uint8_t base)
+{
+    if (string.is8Bit())
+        return parseInteger<IntegralType>(string.characters8(), string.length(), base, TrailingJunkPolicy::Allow);
+    return parseInteger<IntegralType>(string.characters16(), string.length(), base, TrailingJunkPolicy::Allow);
+}
+
+// FIXME: Deprecated. Remove toIntegralType entirely once we get move all callers to parseInteger.
+template<typename IntegralType, typename StringOrStringView> IntegralType toIntegralType(const StringOrStringView& stringView, bool* ok, int base = 10)
+{
+    auto result = parseInteger<IntegralType>(stringView, base);
     if (ok)
-        *ok = isOk;
-    return isOk ? value.unsafeGet() : 0;
+        *ok = result.hasValue();
+    return result.valueOr(0);
 }
 
-template<typename IntegralType, typename StringOrStringView>
-inline IntegralType toIntegralType(const StringOrStringView& stringView, bool* ok, int base = 10)
+// FIXME: Deprecated. Remove toIntegralType entirely once we get move all callers to parseInteger.
+template<typename IntegralType, typename CharacterType> IntegralType toIntegralType(const CharacterType* data, unsigned length, bool* ok, int base = 10)
 {
-    if (stringView.is8Bit())
-        return toIntegralType<IntegralType, LChar>(stringView.characters8(), stringView.length(), ok, base);
-    return toIntegralType<IntegralType, UChar>(stringView.characters16(), stringView.length(), ok, base);
+    return toIntegralType<IntegralType>(StringView { data, length }, ok, base);
 }
 
-template<typename IntegralType, typename CharacterType>
-inline Optional<IntegralType> toIntegralType(const CharacterType* data, size_t length, int base = 10)
+// FIXME: Deprecated. Remove toIntegralType entirely once we get move all callers to parseInteger.
+template<typename IntegralType, typename StringOrStringView> Optional<IntegralType> toIntegralType(const StringOrStringView& stringView, int base = 10)
 {
-    bool ok = false;
-    IntegralType value = toIntegralType<IntegralType>(data, length, &ok, base);
-    if (!ok)
-        return WTF::nullopt;
-    return value;
+    return parseInteger<IntegralType>(stringView, base);
 }
 
-template<typename IntegralType, typename StringOrStringView>
-inline Optional<IntegralType> toIntegralType(const StringOrStringView& stringView, int base = 10)
+// FIXME: Deprecated. Remove toIntegralType entirely once we get move all callers to parseInteger.
+template<typename IntegralType, typename CharacterType> Optional<IntegralType> toIntegralType(const CharacterType* data, unsigned length, int base = 10)
 {
-    bool ok = false;
-    IntegralType value = toIntegralType<IntegralType>(stringView, &ok, base);
-    if (!ok)
-        return WTF::nullopt;
-    return value;
+    return parseInteger<IntegralType>(StringView { data, length }, base);
 }
 
 }
 
+using WTF::parseInteger;
+using WTF::parseIntegerAllowingTrailingJunk;
 using WTF::toIntegralType;

Modified: trunk/Source/WTF/wtf/text/WTFString.cpp (277233 => 277234)


--- trunk/Source/WTF/wtf/text/WTFString.cpp	2021-05-09 00:51:46 UTC (rev 277233)
+++ trunk/Source/WTF/wtf/text/WTFString.cpp	2021-05-09 00:59:28 UTC (rev 277234)
@@ -1016,8 +1016,6 @@
     return toIntegralType<intptr_t>(data, lengthOfCharactersAsInteger<UChar>(data, length), ok, 10);
 }
 
-enum TrailingJunkPolicy { DisallowTrailingJunk, AllowTrailingJunk };
-
 template<typename CharacterType, TrailingJunkPolicy policy>
 static inline double toDoubleType(const CharacterType* data, size_t length, bool* ok, size_t& parsedLength)
 {
@@ -1034,7 +1032,7 @@
 
     parsedLength += leadingSpacesLength;
     if (ok)
-        *ok = policy == AllowTrailingJunk || parsedLength == length;
+        *ok = policy == TrailingJunkPolicy::Allow || parsedLength == length;
     return number;
 }
 
@@ -1041,13 +1039,13 @@
 double charactersToDouble(const LChar* data, size_t length, bool* ok)
 {
     size_t parsedLength;
-    return toDoubleType<LChar, DisallowTrailingJunk>(data, length, ok, parsedLength);
+    return toDoubleType<LChar, TrailingJunkPolicy::Disallow>(data, length, ok, parsedLength);
 }
 
 double charactersToDouble(const UChar* data, size_t length, bool* ok)
 {
     size_t parsedLength;
-    return toDoubleType<UChar, DisallowTrailingJunk>(data, length, ok, parsedLength);
+    return toDoubleType<UChar, TrailingJunkPolicy::Disallow>(data, length, ok, parsedLength);
 }
 
 float charactersToFloat(const LChar* data, size_t length, bool* ok)
@@ -1054,7 +1052,7 @@
 {
     // FIXME: This will return ok even when the string fits into a double but not a float.
     size_t parsedLength;
-    return static_cast<float>(toDoubleType<LChar, DisallowTrailingJunk>(data, length, ok, parsedLength));
+    return static_cast<float>(toDoubleType<LChar, TrailingJunkPolicy::Disallow>(data, length, ok, parsedLength));
 }
 
 float charactersToFloat(const UChar* data, size_t length, bool* ok)
@@ -1061,19 +1059,19 @@
 {
     // FIXME: This will return ok even when the string fits into a double but not a float.
     size_t parsedLength;
-    return static_cast<float>(toDoubleType<UChar, DisallowTrailingJunk>(data, length, ok, parsedLength));
+    return static_cast<float>(toDoubleType<UChar, TrailingJunkPolicy::Disallow>(data, length, ok, parsedLength));
 }
 
 float charactersToFloat(const LChar* data, size_t length, size_t& parsedLength)
 {
     // FIXME: This will return ok even when the string fits into a double but not a float.
-    return static_cast<float>(toDoubleType<LChar, AllowTrailingJunk>(data, length, nullptr, parsedLength));
+    return static_cast<float>(toDoubleType<LChar, TrailingJunkPolicy::Allow>(data, length, nullptr, parsedLength));
 }
 
 float charactersToFloat(const UChar* data, size_t length, size_t& parsedLength)
 {
     // FIXME: This will return ok even when the string fits into a double but not a float.
-    return static_cast<float>(toDoubleType<UChar, AllowTrailingJunk>(data, length, nullptr, parsedLength));
+    return static_cast<float>(toDoubleType<UChar, TrailingJunkPolicy::Allow>(data, length, nullptr, parsedLength));
 }
 
 const String& emptyString()
_______________________________________________
webkit-changes mailing list
webkit-changes@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to