[ https://issues.apache.org/jira/browse/NUTCH-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel resolved NUTCH-2699. ------------------------------------ Resolution: Fixed > Protocol-okhttp: needless loops to increment requested bytes counter when > more content is already buffered > ---------------------------------------------------------------------------------------------------------- > > Key: NUTCH-2699 > URL: https://issues.apache.org/jira/browse/NUTCH-2699 > Project: Nutch > Issue Type: Bug > Components: protocol > Affects Versions: 1.15 > Reporter: Sebastian Nagel > Assignee: Sebastian Nagel > Priority: Minor > Fix For: 1.16 > > > The okhttp library used by the plugin protocol-okhttp buffers content > internal and often has already buffered more content than has been requested. > The plugin should immediately set the request count to the size of the > buffered content to avoid needless loops when the buffered size comes close > to the content limit (the increment steps are too small): > {noformat} > 2019-03-11 14:56:36,642 DEBUG okhttp.OkHttpResponse - > http://localhost/large.pdf - http/1.1 200 OK > 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = > 8192, buffered = 16088 > 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = > 16384, buffered = 24280 > 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = > 24576, buffered = 32472 > 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = > 32768, buffered = 40664 > 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = > 40960, buffered = 48856 > 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = > 49152, buffered = 57048 > 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = > 57344, buffered = 65240 > 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = > 57638, buffered = 65240 > 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = > 57932, buffered = 65240 > 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = > 58226, buffered = 65240 > 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = > 58520, buffered = 65240 > 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = > 58814, buffered = 65240 > 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = > 59108, buffered = 65240 > 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = > 59402, buffered = 65240 > 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = > 59696, buffered = 65240 > 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = > 59990, buffered = 65240 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = > 60284, buffered = 65240 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = > 60578, buffered = 65240 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = > 60872, buffered = 65240 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = > 61166, buffered = 65240 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = > 61460, buffered = 65240 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = > 61754, buffered = 65240 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = > 62048, buffered = 65240 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = > 62342, buffered = 65240 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = > 62636, buffered = 65240 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = > 62930, buffered = 65240 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = > 63224, buffered = 65240 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = > 63518, buffered = 65240 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = > 63812, buffered = 65240 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = > 64106, buffered = 65240 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = > 64400, buffered = 65240 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = > 64694, buffered = 65240 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = > 64988, buffered = 65240 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = > 65282, buffered = 73432 > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - content limit reached > 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - copied 65534 bytes out > of 73432 buffered, remaining buffer contains 7898 bytes > 2019-03-11 14:56:36,645 DEBUG okhttp.OkHttpResponse - HTTP content truncated > to 65534 bytes (reason: LENGTH) > 2019-03-11 14:56:36,661 INFO parse.ParseSegment - http://localhost/large.pdf > skipped. Content of size 366578 was truncated to 65534 > 2019-03-11 14:56:36,661 WARN parse.ParserChecker - Content is truncated, > parse may fail! > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)