unable to build 2.x

2013-05-22 Thread Tejas Patil
Hi nutch-dev, I took a *fresh* checkout of 2.x and tried to build it (ant clean runtime). I get lot of compilation errors. At first when I saw that on the terminal, I said to my laptop : Are you kidding me ?. I re-tried it 2 times again and still the same thing happens. I am checking the reason

[jira] [Updated] (NUTCH-356) Plugin repository cache can lead to memory leak

2013-05-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-356: -- Fix Version/s: 1.8 Plugin repository cache can lead to memory leak

[jira] [Updated] (NUTCH-840) Port tests from parse-html to parse-tika

2013-05-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-840: -- Fix Version/s: 1.8 Port tests from parse-html to parse-tika

[jira] [Updated] (NUTCH-410) Faster RegexNormalize with more features

2013-05-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-410: -- Fix Version/s: 1.8 Faster RegexNormalize with more features

[jira] [Updated] (NUTCH-1253) Incompatible neko and xerces versions

2013-05-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1253: --- Fix Version/s: 1.8 Incompatible neko and xerces versions

[jira] [Updated] (NUTCH-1190) MoreIndexingFilter refactor: move data formats used to parse lastModified to a config file.

2013-05-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1190: --- Fix Version/s: 1.8 MoreIndexingFilter refactor: move data formats used to parse

[jira] [Updated] (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a ?

2013-05-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-797: -- Fix Version/s: 1.8 parse-tika is not properly constructing URLs when the target begins

[jira] [Updated] (NUTCH-566) Sun's URL class has bug in creation of relative query URLs

2013-05-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-566: -- Fix Version/s: 1.8 Sun's URL class has bug in creation of relative query URLs

[jira] [Updated] (NUTCH-1250) parse-html does not parse links with empty anchor

2013-05-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1250: --- Fix Version/s: 1.8 parse-html does not parse links with empty anchor

[jira] [Updated] (NUTCH-1562) Order of execution for scoring filters

2013-05-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1562: --- Fix Version/s: 1.8 Order of execution for scoring filters

[jira] [Updated] (NUTCH-409) Add short circuit notion to filters to speedup mixed site/subsite crawling

2013-05-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-409: -- Fix Version/s: 1.8 Add short circuit notion to filters to speedup mixed site/subsite

[jira] [Updated] (NUTCH-945) Indexing to multiple SOLR Servers

2013-05-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-945: -- Fix Version/s: 1.8 Indexing to multiple SOLR Servers -

[jira] [Updated] (NUTCH-1531) URL filtering takes long time for very long URLs

2013-05-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1531: --- Fix Version/s: 1.8 URL filtering takes long time for very long URLs

[jira] [Updated] (NUTCH-351) Protocol forward proxy

2013-05-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-351: -- Fix Version/s: 1.8 Protocol forward proxy -- Key:

[jira] [Updated] (NUTCH-490) Extension point with filters for Neko HTML parser (with patch)

2013-05-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-490: -- Fix Version/s: 1.8 Extension point with filters for Neko HTML parser (with patch)

fix version 1.7 removed in Jira

2013-05-22 Thread Sebastian Nagel
Hi, please take care not to remove the fix version when applying bulk changes, e.g., 2.2 = 2.3 Alternative fix versions (1.7) are not kept. Luckily Jira is quite powerful, I restored the 1.x fix version using this awful filter: project = NUTCH AND fixVersion in (2.3) AND status = Open AND

[jira] [Updated] (NUTCH-1483) Can't crawl filesystem with protocol-file plugin

2013-05-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1483: --- Priority: Critical (was: Major) Can't crawl filesystem with protocol-file plugin

[jira] [Updated] (NUTCH-1483) Can't crawl filesystem with protocol-file plugin

2013-05-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1483: --- Fix Version/s: 1.7 Can't crawl filesystem with protocol-file plugin

[jira] [Resolved] (NUTCH-1249) Resolve all issues flagged up by adding javac -Xlint arguement

2013-05-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1249. Resolution: Fixed Fix Version/s: 2.2 Assignee: Tejas Patil (was: Lewis John

[jira] [Resolved] (NUTCH-1275) Fix [unchecked] javac warnings

2013-05-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1275. Resolution: Fixed Fix Version/s: 2.2 Got resolved with NUTCH-1249 Fix

[jira] [Commented] (NUTCH-1249) Resolve all issues flagged up by adding javac -Xlint arguement

2013-05-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13663973#comment-13663973 ] Hudson commented on NUTCH-1249: --- Integrated in Nutch-nutchgora #614 (See

[jira] [Commented] (NUTCH-1569) Upgrade 2.x to Gora 0.3

2013-05-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13663974#comment-13663974 ] Hudson commented on NUTCH-1569: --- Integrated in Nutch-nutchgora #614 (See

Build failed in Jenkins: Nutch-nutchgora #614

2013-05-22 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-nutchgora/614/changes Changes: [tejasp] NUTCH-1249 and NUTCH-1275 : Resolve all issues flagged up by adding javac -Xlint argument [lewismc] NUTCH-1569 Upgrade 2.x to Gora 0.3 -- [...truncated 1674 lines...]

[jira] [Created] (NUTCH-1575) support solr authentication in nutch 2.x

2013-05-22 Thread lufeng (JIRA)
lufeng created NUTCH-1575: - Summary: support solr authentication in nutch 2.x Key: NUTCH-1575 URL: https://issues.apache.org/jira/browse/NUTCH-1575 Project: Nutch Issue Type: Improvement

[jira] [Work started] (NUTCH-1575) support solr authentication in nutch 2.x

2013-05-22 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1575 started by lufeng. support solr authentication in nutch 2.x Key: NUTCH-1575

[jira] [Updated] (NUTCH-1575) support solr authentication in nutch 2.x

2013-05-22 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1575: -- Attachment: NUTCH-1575.patch add solr authentication support solr authentication in nutch

[jira] [Commented] (NUTCH-1563) FetchSchedule#getFields is never used by GeneraterJob

2013-05-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13664408#comment-13664408 ] Tejas Patil commented on NUTCH-1563: I think this is relevant to only 2.x and

[jira] [Updated] (NUTCH-1566) bin/nutch to allow whitespace in paths

2013-05-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1566: --- Attachment: NUTCH-1566-v2-trunk.patch New patch including [~tejas.patil]'s suggestions. Also

[jira] [Reopened] (NUTCH-1569) Upgrade 2.x to Gora 0.3

2013-05-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reopened NUTCH-1569: - Upgrade 2.x to Gora 0.3 --- Key: