Branch: refs/heads/main
Home: https://github.com/WebKit/WebKit
Commit: 2c9a03174b0862e47d4c1a6f9320a1ab90a4a478
https://github.com/WebKit/WebKit/commit/2c9a03174b0862e47d4c1a6f9320a1ab90a4a478
Author: Wenson Hsieh <[email protected]>
Date: 2026-03-16 (Mon, 16 Mar 2026)
Changed paths:
A
LayoutTests/fast/text-extraction/debug-text-extraction-word-limit-with-links-expected.txt
A
LayoutTests/fast/text-extraction/debug-text-extraction-word-limit-with-links.html
M Source/WebKit/Shared/TextExtractionToStringConversion.cpp
M Source/WebKit/Shared/TextExtractionToStringConversion.h
M Source/WebKit/UIProcess/API/Cocoa/WKWebView.mm
Log Message:
-----------
[AutoFill Debugging] Make `maxWordsPerParagraph` keep several words of
context before and after links in each paragraph
https://bugs.webkit.org/show_bug.cgi?id=310047
rdar://172423469
Reviewed by Megan Gardner.
Currently, when `maxWordsPerParagraph` is specified, for each paragraph
(`TextItemData`) that
contains more words than the given limit, we preserve only the **first**
`maxWordsPerParagraph`
words and truncate everything else. However, this sometimes results in links
being embedded in
extraction output with insufficient context. To mitigate this, we adjust the
`maxWordsPerParagraph`
truncation heuristic, such that it preserves several (arbitrarily: 5) words
before and after links.
Test: fast/text-extraction/debug-text-extraction-word-limit-with-links.html
*
LayoutTests/fast/text-extraction/debug-text-extraction-word-limit-with-links-expected.txt:
Added.
*
LayoutTests/fast/text-extraction/debug-text-extraction-word-limit-with-links.html:
Added.
Add a layout test to exercise this new behavior.
* Source/WebKit/Shared/TextExtractionToStringConversion.cpp:
(WebKit::characterRangesFromLinks):
(WebKit::truncateByWordCount):
(WebKit::TextExtractionAggregator::truncateTextByWordLimitIfNeeded):
(WebKit::addJSONTextContent):
(WebKit::addPartsForText):
(WebKit::addPartsForItem):
(WebKit::addTextRepresentationRecursive):
* Source/WebKit/Shared/TextExtractionToStringConversion.h:
(WebKit::TextExtractionOptions::TextExtractionOptions):
* Source/WebKit/UIProcess/API/Cocoa/WKWebView.mm:
(-[WKWebView
_extractDebugTextWithConfigurationWithoutUpdatingFilterRules:assertionScope:completionHandler:]):
Pull text truncation logic out of the platform-specific `WKWebView`, and into
shared logic in
`TextExtractionToStringConversion.cpp` (where, importantly, we also have more
context about adjacent
link items).
Canonical link: https://commits.webkit.org/309355@main
To unsubscribe from these emails, change your notification settings at
https://github.com/WebKit/WebKit/settings/notifications