[ 
https://issues.apache.org/jira/browse/NUTCH-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-2699 started by Sebastian Nagel.
----------------------------------------------
> Protocol-okhttp: needless loops to increment requested bytes counter when 
> more content is already buffered
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-2699
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2699
>             Project: Nutch
>          Issue Type: Bug
>          Components: protocol
>    Affects Versions: 1.15
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.16
>
>
> The okhttp library used by the plugin protocol-okhttp buffers content 
> internal and often has already buffered more content than has been requested. 
> The plugin should immediately set the request count to the size of the 
> buffered content to avoid needless loops when the buffered size comes close 
> to the content limit (the increment steps are too small):
> {noformat}
> 2019-03-11 14:56:36,642 DEBUG okhttp.OkHttpResponse - 
> http://localhost/large.pdf - http/1.1 200 OK
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 8192, buffered = 16088
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 16384, buffered = 24280
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 24576, buffered = 32472
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 32768, buffered = 40664
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 40960, buffered = 48856
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 49152, buffered = 57048
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 57344, buffered = 65240
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 57638, buffered = 65240
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 57932, buffered = 65240
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 58226, buffered = 65240
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 58520, buffered = 65240
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 58814, buffered = 65240
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 59108, buffered = 65240
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 59402, buffered = 65240
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 59696, buffered = 65240
> 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 59990, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 60284, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 60578, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 60872, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 61166, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 61460, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 61754, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 62048, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 62342, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 62636, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 62930, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 63224, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 63518, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 63812, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 64106, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 64400, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 64694, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 64988, buffered = 65240
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
> 65282, buffered = 73432
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - content limit reached
> 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - copied 65534 bytes out 
> of 73432 buffered, remaining buffer contains 7898 bytes
> 2019-03-11 14:56:36,645 DEBUG okhttp.OkHttpResponse - HTTP content truncated 
> to 65534 bytes (reason: LENGTH)
> 2019-03-11 14:56:36,661 INFO parse.ParseSegment - http://localhost/large.pdf 
> skipped. Content of size 366578 was truncated to 65534
> 2019-03-11 14:56:36,661 WARN parse.ParserChecker - Content is truncated, 
> parse may fail!
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to