Branch: refs/heads/main
  Home:   https://github.com/WebKit/WebKit
  Commit: 2c9a03174b0862e47d4c1a6f9320a1ab90a4a478
      
https://github.com/WebKit/WebKit/commit/2c9a03174b0862e47d4c1a6f9320a1ab90a4a478
  Author: Wenson Hsieh <[email protected]>
  Date:   2026-03-16 (Mon, 16 Mar 2026)

  Changed paths:
    A 
LayoutTests/fast/text-extraction/debug-text-extraction-word-limit-with-links-expected.txt
    A 
LayoutTests/fast/text-extraction/debug-text-extraction-word-limit-with-links.html
    M Source/WebKit/Shared/TextExtractionToStringConversion.cpp
    M Source/WebKit/Shared/TextExtractionToStringConversion.h
    M Source/WebKit/UIProcess/API/Cocoa/WKWebView.mm

  Log Message:
  -----------
  [AutoFill Debugging] Make `maxWordsPerParagraph` keep several words of 
context before and after links in each paragraph
https://bugs.webkit.org/show_bug.cgi?id=310047
rdar://172423469

Reviewed by Megan Gardner.

Currently, when `maxWordsPerParagraph` is specified, for each paragraph 
(`TextItemData`) that
contains more words than the given limit, we preserve only the **first** 
`maxWordsPerParagraph`
words and truncate everything else. However, this sometimes results in links 
being embedded in
extraction output with insufficient context. To mitigate this, we adjust the 
`maxWordsPerParagraph`
truncation heuristic, such that it preserves several (arbitrarily: 5) words 
before and after links.

Test: fast/text-extraction/debug-text-extraction-word-limit-with-links.html

* 
LayoutTests/fast/text-extraction/debug-text-extraction-word-limit-with-links-expected.txt:
 Added.
* 
LayoutTests/fast/text-extraction/debug-text-extraction-word-limit-with-links.html:
 Added.

Add a layout test to exercise this new behavior.

* Source/WebKit/Shared/TextExtractionToStringConversion.cpp:
(WebKit::characterRangesFromLinks):
(WebKit::truncateByWordCount):
(WebKit::TextExtractionAggregator::truncateTextByWordLimitIfNeeded):
(WebKit::addJSONTextContent):
(WebKit::addPartsForText):
(WebKit::addPartsForItem):
(WebKit::addTextRepresentationRecursive):
* Source/WebKit/Shared/TextExtractionToStringConversion.h:
(WebKit::TextExtractionOptions::TextExtractionOptions):
* Source/WebKit/UIProcess/API/Cocoa/WKWebView.mm:
(-[WKWebView 
_extractDebugTextWithConfigurationWithoutUpdatingFilterRules:assertionScope:completionHandler:]):

Pull text truncation logic out of the platform-specific `WKWebView`, and into 
shared logic in
`TextExtractionToStringConversion.cpp` (where, importantly, we also have more 
context about adjacent
link items).

Canonical link: https://commits.webkit.org/309355@main



To unsubscribe from these emails, change your notification settings at 
https://github.com/WebKit/WebKit/settings/notifications

Reply via email to