[jira] [Commented] (NUTCH-1751) Empty anchors should not index

2014-04-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964876#comment-13964876 ] Hudson commented on NUTCH-1751: --- SUCCESS: Integrated in Nutch-nutchgora #981 (See [https://

[jira] [Commented] (NUTCH-1733) parse-html to support HTML5 charset definitions

2014-04-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964818#comment-13964818 ] Hudson commented on NUTCH-1733: --- SUCCESS: Integrated in Nutch-nutchgora #980 (See [https://

[jira] [Resolved] (NUTCH-1751) Empty anchors should not index

2014-04-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1751. - Resolution: Fixed Committed @revision 1586175 in trunk > Empty anchors should no

[jira] [Created] (NUTCH-1755) Project name bug in build.xml

2014-04-09 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1755: --- Summary: Project name bug in build.xml Key: NUTCH-1755 URL: https://issues.apache.org/jira/browse/NUTCH-1755 Project: Nutch Issue Type: Bug

[jira] [Resolved] (NUTCH-1733) parse-html to support HTML5 charset definitions

2014-04-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1733. Resolution: Fixed Committed to 2.x r1586162. Opened NUTCH-1754 to remove the leading BOM in

[jira] [Created] (NUTCH-1754) remove BOM from extracted plain text

2014-04-09 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-1754: -- Summary: remove BOM from extracted plain text Key: NUTCH-1754 URL: https://issues.apache.org/jira/browse/NUTCH-1754 Project: Nutch Issue Type: Bug

[jira] [Commented] (NUTCH-1751) Empty anchors should not index

2014-04-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964672#comment-13964672 ] Sebastian Nagel commented on NUTCH-1751: +1 Trunk is not affected: Inlinks.getAnch

Re: Creating newbie tag for Nutch Jira

2014-04-09 Thread Lewis John Mcgibbney
I've created a tag as 'nutchNewbie'. Thanks Lewis On Wed, Apr 9, 2014 at 2:35 PM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Hi Folks, > I've just presented on Nutch at ApacheCon NA. Good turnout and some > questions. Also nice to hear that some of the people are using both 1.X >

Creating newbie tag for Nutch Jira

2014-04-09 Thread Lewis John Mcgibbney
Hi Folks, I've just presented on Nutch at ApacheCon NA. Good turnout and some questions. Also nice to hear that some of the people are using both 1.X trunk and 2.X branch. I've had some people coming to me asking to contribute to Nutch... but that they honestly don't really know where to start. I'm

[jira] [Commented] (NUTCH-1752) cache robots.txt rules per protocol:host:port

2014-04-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964632#comment-13964632 ] Sebastian Nagel commented on NUTCH-1752: Yep: Apache httpd and Tomcat on same host

[jira] [Commented] (NUTCH-1731) Better cmd line parsing for NutchServer

2014-04-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964563#comment-13964563 ] Lewis John McGibbney commented on NUTCH-1731: - Sounds excellent. > Better cmd

[jira] [Comment Edited] (NUTCH-1731) Better cmd line parsing for NutchServer

2014-04-09 Thread Fjodor Vershinin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964512#comment-13964512 ] Fjodor Vershinin edited comment on NUTCH-1731 at 4/9/14 6:29 PM: ---

[jira] [Commented] (NUTCH-1731) Better cmd line parsing for NutchServer

2014-04-09 Thread Fjodor Vershinin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964512#comment-13964512 ] Fjodor Vershinin commented on NUTCH-1731: - I made some investigations about this i

[jira] [Commented] (NUTCH-1752) cache robots.txt rules per protocol:host:port

2014-04-09 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964219#comment-13964219 ] lufeng commented on NUTCH-1752: --- Do you mean different port with same protocol and host has

[jira] [Commented] (NUTCH-710) Support for rel="canonical" attribute

2014-04-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964202#comment-13964202 ] Sebastian Nagel commented on NUTCH-710: --- Thanks, [~Sertac Turkel]! My comments: * ev

[jira] [Commented] (NUTCH-1748) urlfilter-validator to allow .. (two dots) inside file names (path elements)

2014-04-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964127#comment-13964127 ] Sebastian Nagel commented on NUTCH-1748: Hi [~alexmc], you'r absolutely right: the

[jira] [Updated] (NUTCH-1748) urlfilter-validator to allow .. (two dots) inside file names (path elements)

2014-04-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1748: --- Summary: urlfilter-validator to allow .. (two dots) inside file names (path elements) (was:

[jira] [Commented] (NUTCH-1748) Despite Unix systems accept files containing two dots.Urlfilter-validator rejects such path names.

2014-04-09 Thread Alex McLintock (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964069#comment-13964069 ] Alex McLintock commented on NUTCH-1748: --- FYI "The similarity to unix and other disk

[jira] [Updated] (NUTCH-710) Support for rel="canonical" attribute

2014-04-09 Thread Sertac TURKEL (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sertac TURKEL updated NUTCH-710: Attachment: NUTCH-710.patch Hi [~lewismc] , I prepared a patch file to solve this issue for 2.x bra

[jira] [Created] (NUTCH-1753) Eclipse dependecy problem for 2.x

2014-04-09 Thread Talat UYARER (JIRA)
Talat UYARER created NUTCH-1753: --- Summary: Eclipse dependecy problem for 2.x Key: NUTCH-1753 URL: https://issues.apache.org/jira/browse/NUTCH-1753 Project: Nutch Issue Type: Bug Affects Ver

[jira] [Updated] (NUTCH-1753) Eclipse dependecy problem for 2.x

2014-04-09 Thread Talat UYARER (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Talat UYARER updated NUTCH-1753: Attachment: NUTCH-1753.patch This patch can solve it. > Eclipse dependecy problem for 2.x > -

[jira] [Commented] (NUTCH-1750) Improvement of Fetcher's reportStatus

2014-04-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963996#comment-13963996 ] Hudson commented on NUTCH-1750: --- SUCCESS: Integrated in Nutch-trunk #2597 (See [https://bui

Jenkins build is back to normal : Nutch-trunk #2597

2014-04-09 Thread Apache Jenkins Server
See

[jira] [Updated] (NUTCH-1752) cache robots.txt rules per protocol:host:port

2014-04-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1752: --- Attachment: NUTCH-1752-v1.patch Patch for trunk and 2.x > cache robots.txt rules per protoco

[jira] [Created] (NUTCH-1752) cache robots.txt rules per protocol:host:port

2014-04-09 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-1752: -- Summary: cache robots.txt rules per protocol:host:port Key: NUTCH-1752 URL: https://issues.apache.org/jira/browse/NUTCH-1752 Project: Nutch Issue Type: B

[jira] [Closed] (NUTCH-1750) Improvement of Fetcher's reportStatus

2014-04-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-1750. > Improvement of Fetcher's reportStatus > - > > Key

[jira] [Resolved] (NUTCH-1750) Improvement of Fetcher's reportStatus

2014-04-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1750. -- Resolution: Fixed Thanks Sebastian Committed revision 1585905. > Improvement of Fetcher's re

[jira] [Commented] (NUTCH-1750) Improvement of Fetcher's reportStatus

2014-04-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963921#comment-13963921 ] Sebastian Nagel commented on NUTCH-1750: +1 > Improvement of Fetcher's reportStat