Build failed in Jenkins: Nutch-nutchgora #1048

2014-06-18 Thread Apache Jenkins Server
See -- [...truncated 3126 lines...] init-plugin: clean-lib: resolve-default: [ivy:resolve] :: loading settings :: file = comp

[jira] [Closed] (NUTCH-1590) [SECURITY] Frame injection vulnerability in published Javadoc

2014-06-18 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-1590. --- Thanks for fixing folks. Great work. > [SECURITY] Frame injection vulnerability in publi

[jira] [Updated] (NUTCH-1590) [SECURITY] Frame injection vulnerability in published Javadoc

2014-06-18 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1590: Fix Version/s: 2.3 > [SECURITY] Frame injection vulnerability in published Javadoc

Re: #nutch on IRC

2014-06-18 Thread Lewis John Mcgibbney
UPDATE We are on #nutchbot Someone took #nutch already! See you there. On Wed, Jun 18, 2014 at 10:34 AM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Hi Folks, > I've opened a channel on IRC for Nutch. > It's at #nutch > For those of you interested in joining the room via browse

Re: Nutch Extension for realtime processing

2014-06-18 Thread Jake Dodd
Hi Julien, Yep, you’re correct about the generation step being a limiting factor in getting new content in realtime—i.e. nearly as soon as it appears on the web. But that isn’t quite what I meant, so I’ll clarify what I mean by “realtime”, in the context of Nutch as it exists today: getting acc

Fixing Nutch 2.x Build on Jenkins

2014-06-18 Thread Lewis John Mcgibbney
Hi Folks, A while ago, somewhere, we broke the 2.x build! I've described this in NUTCH-1792 Here is the paste log which somewhere includes the commit which broke the build. Does anyone have a clue why the TestImageMetadata test for parse-tika is fa

Re: Version of Java in Jenkins

2014-06-18 Thread Lewis John Mcgibbney
Hi Julien, On Tue, Jun 17, 2014 at 10:20 AM, Julien Nioche < lists.digitalpeb...@gmail.com> wrote: > Lewis, > > https://issues.apache.org/jira/browse/NUTCH-1590 requires Java 1.7 for > building the Javadoc. Does something need changing in Jenkins? > Do we want to shift the entire project to use

Re: #nutch on IRC

2014-06-18 Thread Lewis John Mcgibbney
Yep Do you fancy making your first commit to the new CMS? ;) On Wed, Jun 18, 2014 at 10:51 AM, Markus Jelsma wrote: > Cool Lewis. If this is there to stay, shouldn't we advertise it on our > homepage? > > Markus > > On Wednesday, June 18, 2014 10:34:13 AM Lewis John Mcgibbney wrote: > > Hi Folk

[jira] [Updated] (NUTCH-1084) ReadDB url throws exception

2014-06-18 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1084: - Attachment: NUTCH-1084.patch Could do the export only for the actions that need it (i.e the ones

[jira] [Comment Edited] (NUTCH-1084) ReadDB url throws exception

2014-06-18 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035797#comment-14035797 ] Julien Nioche edited comment on NUTCH-1084 at 6/18/14 3:18 PM: -

[jira] [Commented] (NUTCH-1084) ReadDB url throws exception

2014-06-18 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035797#comment-14035797 ] Julien Nioche commented on NUTCH-1084: -- See [Hadoop In Action|http://books.google.co

#nutch on IRC

2014-06-18 Thread Lewis John Mcgibbney
Hi Folks, I've opened a channel on IRC for Nutch. It's at #nutch For those of you interested in joining the room via browser, you can do so here http://webchat.freenode.net/ Thanks Lewis -- *Lewis*

[Nutch Wiki] Update of "bin/nutch inject" by JulienNioche

2014-06-18 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "bin/nutch inject" page has been changed by JulienNioche: https://wiki.apache.org/nutch/bin/nutch%20inject?action=diff&rev1=2&rev2=3 '': The directory containing our seed li

[jira] [Updated] (NUTCH-1692) SegmentReader broken in distributed mode

2014-06-18 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1692: - Affects Version/s: (was: 1.7) 1.8 > SegmentReader broken in distribute

Re: Nutch Extension for realtime processing

2014-06-18 Thread Julien Nioche
Hi Jake Great to hear about your ideas. Sounds like what you are proposing would be only "near" realtime as much would depend on the generation which is a batch step. How / when would the update step be called? Would this be a fetcher only i.e. does not recursively discover links. If so why not go