[jira] [Commented] (NUTCH-2584) Upgrade parse-tika to use Tika 1.18

2018-05-29 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493169#comment-16493169 ] Sebastian Nagel commented on NUTCH-2584: Hi [~Bl4ck1c3], I've tried

[jira] [Comment Edited] (NUTCH-2584) Upgrade parse-tika to use Tika 1.18

2018-05-29 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493169#comment-16493169 ] Sebastian Nagel edited comment on NUTCH-2584 at 5/29/18 7:2

[jira] [Commented] (NUTCH-2587) Tests do not pass

2018-05-29 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493223#comment-16493223 ] Sebastian Nagel commented on NUTCH-2587: On current master? {noformat} %

[jira] [Commented] (NUTCH-2588) Getting status code x01 (unfetched) on more than 80% crawled urls

2018-05-29 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493253#comment-16493253 ] Sebastian Nagel commented on NUTCH-2588: Nutch needs to find URLs as outl

[jira] [Resolved] (NUTCH-2588) Getting status code x01 (unfetched) on more than 80% crawled urls

2018-05-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2588. Resolution: Not A Problem Hi [~usama_], please subscribe to the [Nutch user mailing list

[jira] [Commented] (NUTCH-2589) HTML redirections are not followed when using parse-tika

2018-05-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495035#comment-16495035 ] Sebastian Nagel commented on NUTCH-2589: Hi [~gbouchar], confirmed. Thanks!

[jira] [Commented] (NUTCH-2589) HTML redirections are not followed when using parse-tika

2018-05-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495281#comment-16495281 ] Sebastian Nagel commented on NUTCH-2589: [PR #336|https://github.com/ap

[jira] [Created] (NUTCH-2590) SegmentReader -get fails

2018-05-31 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2590: -- Summary: SegmentReader -get fails Key: NUTCH-2590 URL: https://issues.apache.org/jira/browse/NUTCH-2590 Project: Nutch Issue Type: Bug

[jira] [Commented] (NUTCH-2590) SegmentReader -get fails

2018-05-31 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496741#comment-16496741 ] Sebastian Nagel commented on NUTCH-2590: The old mapred SequenceFileOutputFo

[jira] [Updated] (NUTCH-2590) SegmentReader -get fails

2018-05-31 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2590: --- Description: SegmentReader {{\-get}} fails in local and (pseudo-)distributed mode: {noformat

[jira] [Commented] (NUTCH-2583) Upgrading Nutch's dependencies

2018-05-31 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496768#comment-16496768 ] Sebastian Nagel commented on NUTCH-2583: Successfully tested the ove

[jira] [Resolved] (NUTCH-2591) Can not import org.json.simple.JSONObject in Nutch 2.3.1

2018-06-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2591. Resolution: Not A Problem Hi [~usama_], this is a bug tracker not a support forum. Please

[jira] [Commented] (NUTCH-2585) NPE in TrieStringMatcher

2018-06-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498278#comment-16498278 ] Sebastian Nagel commented on NUTCH-2585: The only issue I can see from the s

[jira] [Resolved] (NUTCH-1480) SolrIndexer to write to multiple servers.

2018-06-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1480. Resolution: Fixed Fix Version/s: 1.15 Merged [PR #218|https://github.com/apache

[jira] [Resolved] (NUTCH-2580) Improvements for Rabbitmq support

2018-06-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2580. Resolution: Implemented Merged [PR #335|https://github.com/apache/nutch/pull/335]. Thanks

[jira] [Resolved] (NUTCH-2583) Upgrading Nutch's dependencies

2018-06-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2583. Resolution: Implemented Changes applied as part of [PR #336|https://github.com/apache

[jira] [Resolved] (NUTCH-2584) Upgrade parse-tika to use Tika 1.18

2018-06-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2584. Resolution: Fixed Implemented with merge of [PR #336|https://github.com/apache/nutch/pull

[jira] [Updated] (NUTCH-2589) HTML redirections are not followed when using parse-tika

2018-06-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2589: --- Affects Version/s: 1.14 > HTML redirections are not followed when using parse-t

[jira] [Updated] (NUTCH-2589) HTML redirections are not followed when using parse-tika

2018-06-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2589: --- Fix Version/s: 1.15 > HTML redirections are not followed when using parse-t

[jira] [Resolved] (NUTCH-2589) HTML redirections are not followed when using parse-tika

2018-06-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2589. Resolution: Fixed Fixed as part of [PR #336|https://github.com/apache/nutch/pull/336

[jira] [Updated] (NUTCH-2589) HTML redirections are not followed when using parse-tika

2018-06-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2589: --- Component/s: plugin parser > HTML redirections are not followed when us

[jira] [Resolved] (NUTCH-2562) protocol-http fails to read large chunked HTTP responses

2018-06-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2562. Resolution: Fixed Fixed/merged. Thanks, [~gbouchar]! > protocol-http fails to read la

[jira] [Resolved] (NUTCH-2590) SegmentReader -get fails

2018-06-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2590. Resolution: Fixed > SegmentReader -get fa

[jira] [Created] (NUTCH-2592) Fetcher to log reason of failed fetches

2018-06-04 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2592: -- Summary: Fetcher to log reason of failed fetches Key: NUTCH-2592 URL: https://issues.apache.org/jira/browse/NUTCH-2592 Project: Nutch Issue Type: Bug

[jira] [Resolved] (NUTCH-2593) Single mode doesn't work in RabbitMQ indexer

2018-06-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2593. Resolution: Fixed Merged. Thanks, [~roannel]! > Single mode doesn't work in

[jira] [Resolved] (NUTCH-2592) Fetcher to log reason of failed fetches

2018-06-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2592. Resolution: Fixed Assignee: Sebastian Nagel Trivial fix. Merged. > Fetcher to

[jira] [Resolved] (NUTCH-2416) Fetcher to log thread ID

2018-06-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2416. Resolution: Fixed This has been solved with NUTCH-2375 (commit c93d908bb). Fetcher logs

[jira] [Updated] (NUTCH-2512) Nutch does not build under JDK9

2018-06-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2512: --- Summary: Nutch does not build under JDK9 (was: Nutch 1.14 does not work under JDK9

[jira] [Commented] (NUTCH-2512) Nutch does not build under JDK9

2018-06-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503033#comment-16503033 ] Sebastian Nagel commented on NUTCH-2512: Hi [~Bl4ck1c3], I've now even

[jira] [Updated] (NUTCH-2574) Generator: hostCount >= maxCount comparison wrong

2018-06-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2574: --- Summary: Generator: hostCount >= maxCount comparison wrong (was: hostCount >= ma

[jira] [Commented] (NUTCH-2512) Nutch does not build under JDK9

2018-06-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503963#comment-16503963 ] Sebastian Nagel commented on NUTCH-2512: The Solr URL used for indexing sh

[jira] [Assigned] (NUTCH-2505) nutch does not delete the .locked file, when the generator partition got an exception

2018-06-07 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2505: -- Assignee: Sebastian Nagel > nutch does not delete the .locked file, when

[jira] [Updated] (NUTCH-2505) nutch does not delete the .locked file, when the generator partition got an exception

2018-06-07 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2505: --- Fix Version/s: 1.15 > nutch does not delete the .locked file, when the generator partit

[jira] [Updated] (NUTCH-2505) nutch does not delete the .locked file, when the generator partition got an exception

2018-06-07 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2505: --- Affects Version/s: 1.14 > nutch does not delete the .locked file, when the genera

[jira] [Resolved] (NUTCH-2505) nutch does not delete the .locked file, when the generator partition got an exception

2018-06-07 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2505. Resolution: Fixed Fixed/merged. Thanks, [~ajoylian]! > nutch does not delete the .loc

[jira] [Assigned] (NUTCH-2530) Rename property db.max.anchor.length > linkdb.max.anchor.length

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2530: -- Assignee: Sebastian Nagel > Rename property db.max.anchor.len

[jira] [Resolved] (NUTCH-2530) Rename property db.max.anchor.length > linkdb.max.anchor.length

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2530. Resolution: Fixed Fixed/merged. > Rename property db.max.anchor.len

[jira] [Resolved] (NUTCH-2581) Caching of redirected robots.txt may overwrite correct robots.txt rules

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2581. Resolution: Fixed Fixed in 1.x and 2.x. Although this usually rarely happens, it&#

[jira] [Created] (NUTCH-2595) Upgrade crawler-commons dependency to 0.10

2018-06-08 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2595: -- Summary: Upgrade crawler-commons dependency to 0.10 Key: NUTCH-2595 URL: https://issues.apache.org/jira/browse/NUTCH-2595 Project: Nutch Issue Type

[jira] [Updated] (NUTCH-2561) protocol-http can be made to read arbitrarily large HTTP responses

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2561: --- Affects Version/s: 1.14 > protocol-http can be made to read arbitrarily large HTTP respon

[jira] [Updated] (NUTCH-2561) protocol-http can be made to read arbitrarily large HTTP responses

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2561: --- Fix Version/s: 1.15 > protocol-http can be made to read arbitrarily large HTTP respon

[jira] [Updated] (NUTCH-2557) protocol-http fails to follow redirections when an HTTP response body is invalid

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2557: --- Fix Version/s: 1.15 > protocol-http fails to follow redirections when an HTTP response b

[jira] [Updated] (NUTCH-2557) protocol-http fails to follow redirections when an HTTP response body is invalid

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2557: --- Affects Version/s: 1.14 > protocol-http fails to follow redirections when an HTTP respo

[jira] [Resolved] (NUTCH-2424) Mirror git repository to gitlab.com

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2424. Resolution: Won't Fix > Mirror git repository to gi

[jira] [Resolved] (NUTCH-2257) apache-nutch-2.3.1-src.tar.gz can not be built

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2257. Resolution: Cannot Reproduce > apache-nutch-2.3.1-src.tar.gz can not be bu

[jira] [Updated] (NUTCH-2040) Upgrade to recent version of Crawler-Commons

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2040: --- Fix Version/s: 2.4 > Upgrade to recent version of Crawler-Comm

[jira] [Updated] (NUTCH-2555) URL normalization problem: path not starting with a '/'

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2555: --- Affects Version/s: 1.14 > URL normalization problem: path not starting wit

[jira] [Updated] (NUTCH-2558) protocol-http cannot handle a missing HTTP status line

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2558: --- Fix Version/s: 1.15 > protocol-http cannot handle a missing HTTP status l

[jira] [Updated] (NUTCH-2556) protocol-http makes invalid HTTP/1.0 requests

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2556: --- Fix Version/s: 1.15 > protocol-http makes invalid HTTP/1.0 reque

[jira] [Updated] (NUTCH-2555) URL normalization problem: path not starting with a '/'

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2555: --- Fix Version/s: 1.15 > URL normalization problem: path not starting wit

[jira] [Updated] (NUTCH-2556) protocol-http makes invalid HTTP/1.0 requests

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2556: --- Affects Version/s: 1.14 > protocol-http makes invalid HTTP/1.0 reque

[jira] [Updated] (NUTCH-2558) protocol-http cannot handle a missing HTTP status line

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2558: --- Affects Version/s: 1.14 > protocol-http cannot handle a missing HTTP status l

[jira] [Updated] (NUTCH-2559) protocol-http cannot handle colons after the HTTP status code

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2559: --- Affects Version/s: 1.14 > protocol-http cannot handle colons after the HTTP status c

[jira] [Updated] (NUTCH-2563) HTTP header spellchecking issues

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2563: --- Affects Version/s: 1.14 > HTTP header spellchecking iss

[jira] [Updated] (NUTCH-2560) protocol-http throws an error when an http header spans over multiple lines

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2560: --- Fix Version/s: 1.15 > protocol-http throws an error when an http header spans over multi

[jira] [Updated] (NUTCH-2560) protocol-http throws an error when an http header spans over multiple lines

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2560: --- Affects Version/s: 1.14 > protocol-http throws an error when an http header spans o

[jira] [Updated] (NUTCH-2563) HTTP header spellchecking issues

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2563: --- Fix Version/s: 1.15 > HTTP header spellchecking iss

[jira] [Updated] (NUTCH-2549) protocol-http does not behave the same as browsers

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2549: --- Affects Version/s: 1.14 > protocol-http does not behave the same as brows

[jira] [Updated] (NUTCH-2549) protocol-http does not behave the same as browsers

2018-06-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2549: --- Fix Version/s: 1.15 > protocol-http does not behave the same as brows

[jira] [Updated] (NUTCH-2559) protocol-http cannot handle colons after the HTTP status code

2018-06-10 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2559: --- Fix Version/s: 1.15 > protocol-http cannot handle colons after the HTTP status c

[jira] [Assigned] (NUTCH-2549) protocol-http does not behave the same as browsers

2018-06-10 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2549: -- Assignee: Sebastian Nagel > protocol-http does not behave the same as brows

[jira] [Created] (NUTCH-2596) Upgrade from org.mortbay.jetty to org.eclipse.jetty

2018-06-11 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2596: -- Summary: Upgrade from org.mortbay.jetty to org.eclipse.jetty Key: NUTCH-2596 URL: https://issues.apache.org/jira/browse/NUTCH-2596 Project: Nutch Issue

[jira] [Commented] (NUTCH-2512) Nutch does not build under JDK9

2018-06-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507984#comment-16507984 ] Sebastian Nagel commented on NUTCH-2512: Opened NUTCH-2596 after a failed t

[jira] [Commented] (NUTCH-2549) protocol-http does not behave the same as browsers

2018-06-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508006#comment-16508006 ] Sebastian Nagel commented on NUTCH-2549: Hi [~gbouchar], PR is open to fix

[jira] [Commented] (NUTCH-2557) protocol-http fails to follow redirections when an HTTP response body is invalid

2018-06-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508019#comment-16508019 ] Sebastian Nagel commented on NUTCH-2557: Hi [~omkar20895], hi [~gbouchar],

[jira] [Commented] (NUTCH-2560) protocol-http throws an error when an http header spans over multiple lines

2018-06-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508028#comment-16508028 ] Sebastian Nagel commented on NUTCH-2560: See [RFC 7230, section 3.2.4|h

[jira] [Commented] (NUTCH-2565) MergeDB incorrectly handles unfetched CrawlDatums

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509596#comment-16509596 ] Sebastian Nagel commented on NUTCH-2565: I thought first about making

[jira] [Resolved] (NUTCH-2595) Upgrade crawler-commons dependency to 0.10

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2595. Resolution: Implemented > Upgrade crawler-commons dependency to 0

[jira] [Assigned] (NUTCH-2595) Upgrade crawler-commons dependency to 0.10

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2595: -- Assignee: Sebastian Nagel > Upgrade crawler-commons dependency to 0

[jira] [Work started] (NUTCH-2576) HTTP protocol plugin based on okhttp

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2576 started by Sebastian Nagel. -- > HTTP protocol plugin based on okh

[jira] [Assigned] (NUTCH-2576) HTTP protocol plugin based on okhttp

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2576: -- Assignee: Sebastian Nagel > HTTP protocol plugin based on okh

[jira] [Resolved] (NUTCH-2576) HTTP protocol plugin based on okhttp

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2576. Resolution: Implemented > HTTP protocol plugin based on okh

[jira] [Resolved] (NUTCH-2040) Upgrade to recent version of Crawler-Commons

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2040. Resolution: Implemented > Upgrade to recent version of Crawler-Comm

[jira] [Resolved] (NUTCH-2555) URL normalization problem: path not starting with a '/'

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2555. Resolution: Fixed > URL normalization problem: path not starting wit

[jira] [Resolved] (NUTCH-2556) protocol-http makes invalid HTTP/1.0 requests

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2556. Resolution: Fixed HTTP/1.1 is now the default for protocol-http but setting http.useHttp11

[jira] [Resolved] (NUTCH-2558) protocol-http cannot handle a missing HTTP status line

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2558. Resolution: Fixed > protocol-http cannot handle a missing HTTP status l

[jira] [Resolved] (NUTCH-2559) protocol-http cannot handle colons after the HTTP status code

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2559. Resolution: Fixed > protocol-http cannot handle colons after the HTTP status c

[jira] [Resolved] (NUTCH-2561) protocol-http can be made to read arbitrarily large HTTP responses

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2561. Resolution: Fixed Thanks, [~gbouchar], esp. for the idea for the unit test server

[jira] [Resolved] (NUTCH-2557) protocol-http fails to follow redirections when an HTTP response body is invalid

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2557. Resolution: Fixed Thanks, [~gbouchar] and [~omkar20895]! > protocol-http fails to fol

[jira] [Resolved] (NUTCH-2563) HTTP header spellchecking issues

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2563. Resolution: Fixed > HTTP header spellchecking iss

[jira] [Resolved] (NUTCH-2549) protocol-http does not behave the same as browsers

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2549. Resolution: Fixed Thanks, [~gbouchar] for the careful analysis! > protocol-http does

[jira] [Updated] (NUTCH-2512) Nutch does not build under JDK9

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2512: --- Fix Version/s: (was: 1.15) 1.16 > Nutch does not build under J

[jira] [Resolved] (NUTCH-2560) protocol-http throws an error when an http header spans over multiple lines

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2560. Resolution: Cannot Reproduce Thanks, [~gbouchar]. There is now a unit test for multi-line

[jira] [Resolved] (NUTCH-2564) protocol-http throws an error when the content-length header is not a number

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2564. Resolution: Fixed > protocol-http throws an error when the content-length header is no

[jira] [Updated] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2292: --- Fix Version/s: (was: 1.15) 1.16 > Mavenize the build for nutch-c

[jira] [Updated] (NUTCH-2030) ParseZip plugin is not able to extract language from zip document,this could solve that problem.

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2030: --- Fix Version/s: (was: 1.15) 1.16 > ParseZip plugin is not able

[jira] [Updated] (NUTCH-2334) Extension point for schedulers

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2334: --- Fix Version/s: (was: 1.15) 1.16 > Extension point for schedul

[jira] [Commented] (NUTCH-2030) ParseZip plugin is not able to extract language from zip document,this could solve that problem.

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510069#comment-16510069 ] Sebastian Nagel commented on NUTCH-2030: So, it's about parse-zip or

[jira] [Updated] (NUTCH-2032) Plugin to index the raw content of a readable document.

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2032: --- Fix Version/s: (was: 1.15) > Plugin to index the raw content of a readable docum

[jira] [Commented] (NUTCH-2140) Atomic update and optimistic concurrency update using Solr

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510075#comment-16510075 ] Sebastian Nagel commented on NUTCH-2140: Hi [~roannel], is this sti

[jira] [Updated] (NUTCH-2369) Create a new GraphGenerator Tool for writing Nutch Records as a Full Web Graph

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2369: --- Fix Version/s: (was: 1.15) > Create a new GraphGenerator Tool for writing Nutch Reco

[jira] [Updated] (NUTCH-2267) Solr indexer fails at the end of the job with a java error message

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2267: --- Fix Version/s: (was: 1.15) > Solr indexer fails at the end of the job with a java er

[jira] [Resolved] (NUTCH-2267) Solr indexer fails at the end of the job with a java error message

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2267. Resolution: Done PR has been merged. Closing this for now. Thanks to everyone involved

[jira] [Resolved] (NUTCH-2312) Support PhantomJS as a WebDriver in protocol-selenium

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2312. Resolution: Incomplete Fix Version/s: (was: 1.15) No patch/PR provided so far

[jira] [Updated] (NUTCH-2382) indexer-hbase Nutch 1.x branch

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2382: --- Fix Version/s: (was: 1.15) 1.16 > indexer-hbase Nutch 1.x bra

[jira] [Commented] (NUTCH-2382) indexer-hbase Nutch 1.x branch

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510086#comment-16510086 ] Sebastian Nagel commented on NUTCH-2382: After NUTCH-1480 the patch needs t

[jira] [Resolved] (NUTCH-2251) Make CommonCrawlFormatJackson instance reusable by properly handling object state

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2251. Resolution: Duplicate Fix Version/s: (was: 1.15) > M

[jira] [Updated] (NUTCH-2239) Selenium Handlers for Ajax Patterns from Student submissions

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2239: --- Fix Version/s: (was: 1.15) > Selenium Handlers for Ajax Patterns from Stud

[jira] [Commented] (NUTCH-2239) Selenium Handlers for Ajax Patterns from Student submissions

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510090#comment-16510090 ] Sebastian Nagel commented on NUTCH-2239: Hi [~chrismattmann], still in prog

[jira] [Updated] (NUTCH-2265) Write A Test Package for Scoring Similarity

2018-06-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2265: --- Fix Version/s: (was: 1.15) > Write A Test Package for Scoring Similar

<    5   6   7   8   9   10   11   12   13   14   >