[jira] [Updated] (NUTCH-1793) HttpRobotRulesParser not configured properly => "http.robots.403.allow" property is not read

2014-06-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1793: - Attachment: NUTCH-1793.patch Will commit shortly unless someone has an objection > HttpRobotRule

[jira] [Commented] (NUTCH-1793) HttpRobotRulesParser not configured properly => "http.robots.403.allow" property is not read

2014-06-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033548#comment-14033548 ] Markus Jelsma commented on NUTCH-1793: -- +1, thanks! > HttpRobotRulesParser not confi

[jira] [Resolved] (NUTCH-1793) HttpRobotRulesParser not configured properly => "http.robots.403.allow" property is not read

2014-06-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1793. -- Resolution: Fixed Fix Version/s: 1.9 Trunk => Committed revision 1603094. Thanks Markus

[jira] [Assigned] (NUTCH-1269) Generate main problems

2014-06-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reassigned NUTCH-1269: Assignee: Julien Nioche > Generate main problems > -- > >

[jira] [Updated] (NUTCH-1269) Improve distribution of URLS with multi-segment generation

2014-06-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1269: - Summary: Improve distribution of URLS with multi-segment generation (was: Generate main problems

[jira] [Commented] (NUTCH-1793) HttpRobotRulesParser not configured properly => "http.robots.403.allow" property is not read

2014-06-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033595#comment-14033595 ] Hudson commented on NUTCH-1793: --- SUCCESS: Integrated in Nutch-trunk #2660 (See [https://bui

[jira] [Updated] (NUTCH-1492) Support gora-dynamodb in Nutch 2.x

2014-06-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1492: - Component/s: (was: build) > Support gora-dynamodb in Nutch 2.x >

[jira] [Commented] (NUTCH-1633) slf4j is provided by hadoop and should not be included in the job file.

2014-06-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033665#comment-14033665 ] Julien Nioche commented on NUTCH-1633: -- We all seem to have missed this one, sorry Ka

[jira] [Updated] (NUTCH-1220) Upgrade Solr deps

2014-06-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1220: - Component/s: indexer > Upgrade Solr deps > - > > Key: NUTCH-1220

[jira] [Resolved] (NUTCH-1285) Debian Packaging for Nutch

2014-06-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1285. -- Resolution: Won't Fix IMHO does not make much sense for the distributed mode as it touches on t

[jira] [Commented] (NUTCH-1590) [SECURITY] Frame injection vulnerability in published Javadoc

2014-06-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033670#comment-14033670 ] Julien Nioche commented on NUTCH-1590: -- Ok how do we deal with this one? Would it imp

[jira] [Commented] (NUTCH-1590) [SECURITY] Frame injection vulnerability in published Javadoc

2014-06-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033686#comment-14033686 ] Markus Jelsma commented on NUTCH-1590: -- Can we not just force Javadoc build with Java

[jira] [Created] (NUTCH-1794) IndexingFilterChecker to optionally dumpText

2014-06-17 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-1794: Summary: IndexingFilterChecker to optionally dumpText Key: NUTCH-1794 URL: https://issues.apache.org/jira/browse/NUTCH-1794 Project: Nutch Issue Type: Impro

[jira] [Updated] (NUTCH-1794) IndexingFilterChecker to optionally dumpText

2014-06-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1794: - Attachment: NUTCH-1794-trunk.patch Patch for trunk. Use the -dumpText option. > IndexingFilterCh

[jira] [Commented] (NUTCH-1794) IndexingFilterChecker to optionally dumpText

2014-06-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033710#comment-14033710 ] Markus Jelsma commented on NUTCH-1794: -- Ah, seems to work. Will commit this trivial o

[jira] [Updated] (NUTCH-1590) [SECURITY] Frame injection vulnerability in published Javadoc

2014-06-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1590: - Attachment: NUTCH-1590.patch What about doing it like this? Haven't tested exhaustively but seems

[jira] [Commented] (NUTCH-1794) IndexingFilterChecker to optionally dumpText

2014-06-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033791#comment-14033791 ] Julien Nioche commented on NUTCH-1794: -- haven't tested it but looks fine.+1 > Indexi

[jira] [Commented] (NUTCH-1590) [SECURITY] Frame injection vulnerability in published Javadoc

2014-06-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033811#comment-14033811 ] Markus Jelsma commented on NUTCH-1590: -- Yes! Looks good! I see you have included the

[jira] [Resolved] (NUTCH-1590) [SECURITY] Frame injection vulnerability in published Javadoc

2014-06-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1590. -- Resolution: Fixed Good catch! Thanks Markus Trunk : committed revision 1603179. 2.x (for what i

Version of Java in Jenkins

2014-06-17 Thread Julien Nioche
Lewis, https://issues.apache.org/jira/browse/NUTCH-1590 requires Java 1.7 for building the Javadoc. Does something need changing in Jenkins? BTW is there a WIKI page somewhere on how to configure Jenkins? Thanks Julien -- Open Source Solutions for Text Engineering http://digitalpebble.blogsp

[jira] [Resolved] (NUTCH-1794) IndexingFilterChecker to optionally dumpText

2014-06-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1794. -- Resolution: Fixed Thanks! Committed revision for trunk in rev. 1603185. > IndexingFilterCheck

[jira] [Commented] (NUTCH-1422) reset signature for redirects

2014-06-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033850#comment-14033850 ] Markus Jelsma commented on NUTCH-1422: -- Yes, agreed. Although right now we don't have

[jira] [Assigned] (NUTCH-1776) Log incorrect plugin.folder file path

2014-06-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-1776: Assignee: Markus Jelsma > Log incorrect plugin.folder file path > -

[jira] [Commented] (NUTCH-1590) [SECURITY] Frame injection vulnerability in published Javadoc

2014-06-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033871#comment-14033871 ] Hudson commented on NUTCH-1590: --- FAILURE: Integrated in Nutch-nutchgora #1046 (See [https:/

Build failed in Jenkins: Nutch-nutchgora #1046

2014-06-17 Thread Apache Jenkins Server
See Changes: [jnioche] NUTCH-1590 [SECURITY] Frame injection vulnerability in published Javadoc (jnioche) -- [...truncated 3068 lines...] clean-lib: resolve-default: [ivy:resolve] :: loading s

[jira] [Commented] (NUTCH-1590) [SECURITY] Frame injection vulnerability in published Javadoc

2014-06-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033883#comment-14033883 ] Hudson commented on NUTCH-1590: --- SUCCESS: Integrated in Nutch-trunk #2661 (See [https://bui

[jira] [Commented] (NUTCH-1794) IndexingFilterChecker to optionally dumpText

2014-06-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033882#comment-14033882 ] Hudson commented on NUTCH-1794: --- SUCCESS: Integrated in Nutch-trunk #2661 (See [https://bui

Nutch Extension for realtime processing

2014-06-17 Thread Jake Dodd
Hi all, My organization is mulling the creation of a Nutch Extension Point that would enable realtime processing of Nutch documents as they’re fetched. We have the desire to pass Nutch-fetched documents to a realtime framework such as Storm or Spark. Currently, it’s trivial to implement a custo

RE: Nutch Extension for realtime processing

2014-06-17 Thread Markus Jelsma
Hi Jake, It would be more pluggable if you just implement an indexer backend plugin for your target (storm, spark) so you can use the existing indexing filtering framework and plugins to enrich the data. If you then couple the indexing logic to FetcherOutputFormat, you can skip the parse (becau

Re: Nutch Extension for realtime processing

2014-06-17 Thread Mattmann, Chris A (3980)
Jake I am totally interested in this. Contributing to Nutch (and more generally to Apache projects) is described really well (by Dennis Kubes) here: http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer Looking forward to seeing your contributions!

Re: Nutch Extension for realtime processing

2014-06-17 Thread Jake Dodd
Markus: The indexer plugin idea definitely works if the goal is only to pass Nutch-collected data to realtime frameworks. However, there are some cool things that you can do in “real" realtime (heh), as opposed to the batch nature of Nutch’s indexing plugins and the FetcherOutputFormat. Moreover

Build failed in Jenkins: Nutch-nutchgora #1047

2014-06-17 Thread Apache Jenkins Server
See -- [...truncated 3068 lines...] init-plugin: clean-lib: resolve-default: [ivy:resolve] :: loading settings :: file = comp