Re: Jump to 3.X WAS [RELEASE] Apache Nutch 1.9

2014-09-01 Thread Julien Nioche
Hi chaps, -1 from me. IMHO moving the trunk code to 3.x does not really solve the issue. I'd rather make it more explicit that the standard Nutch (1.x) and Nutch-GORA (2.x) are two separate beasts for instance by referring to 2.x as Nutch-GORA in the artifacts we release. This way users won't assu

Re: Jump to 3.X WAS [RELEASE] Apache Nutch 1.9

2014-09-01 Thread Julien Nioche
> between 1.x with 2.x > > "Changing to 3.x would imply a major change of architecture or > functionality, which certainly won't be the case for the next release of > the trunk. " I agree with Julien. > > IMHO Opinion We do not need any changes. > > Tala

Re: Jump to 3.X WAS [RELEASE] Apache Nutch 1.9

2014-09-01 Thread Julien Nioche
iginal Message- > From: Julien Nioche > Reply-To: "dev@nutch.apache.org" > Date: Monday, September 1, 2014 2:23 AM > To: "dev@nutch.apache.org" > Cc: Chris Mattmann > Subject: Re: Jump to 3.X WAS [RELEASE] Apache Nutch 1.9 > > >Hi chaps, > >

Re: Nutch won't fetch the whole page if the Transfer Dncoding is chunked

2014-09-17 Thread Julien Nioche
Hi Isn't that an effect of http.content.limit 65536 The length limit for downloaded content using the http:// protocol, in bytes. If this value is nonnegative (>=0), content longer than it will be truncated; otherwise, no truncation at all. Do not confuse this setting with the file.content.limit

Re: Generic xsl parser plugin

2014-09-25 Thread Julien Nioche
Hi Albin, You don't have to have a separate plugin for each html structure you want to parse. You can have a single plugin with multiple HTMLParseFilters. Having a generic extractor with the extraction logic configured in an external file is definitely a good idea and would make a great contribut

Re: Generic xsl parser plugin

2014-09-26 Thread Julien Nioche
xtraction-with-apache-nutch/ Is >> the nutch community going to use this? >> >> >> >> On Thu, Sep 25, 2014 at 5:49 AM, Julien Nioche < >> lists.digitalpeb...@gmail.com> wrote: >> >>> Hi Albin, >>> >>> You don't have to hav

Re: GSoC 2015

2015-02-04 Thread Julien Nioche
Moving to Hadoop 2.x ? On 4 February 2015 at 14:42, Lewis John Mcgibbney wrote: > Hi Folks, > Does anyone have any good ideas for GSoC? > Seb mentioned moving Nutch towards Spark so potentially a pluggable > runtime execution engine abstraction? > I am currently working on a lot of security and

Re: [ANNOUNCE] New Nutch committer and PMC - Jorge Luis Betancourt Gonzalez

2015-02-19 Thread Julien Nioche
Congratulations and welcome Jorge! Great to have you with us Julien On 19 February 2015 at 17:20, Sebastian Nagel wrote: > Dear all, > > on behalf of the Nutch PMC it is my pleasure to announce that > Jorge Luis Betancourt Gonzalez has been voted in as committer > and member of the Nutch PMC. J

Re: Unsubscribe

2015-02-26 Thread Julien Nioche
Massimo, http://nutch.apache.org/mailing_lists.html => dev-unsubscr...@nutch.apache.org Thanks On 26 February 2015 at 19:11, Massimo Miccoli wrote: > > > Massimo > > > Il giorno 26/feb/2015, alle ore 19:31, lewi...@apache.org ha scritto: > > > > Author: lewismc > > Date: Thu Feb 26 18:31:39 2

Re: Review Request 31579: Patch fo NUTCH-1949: Dump out the Nuth data into the Common Crawl format

2015-03-02 Thread Julien Nioche
as an extension of IndexWriter? See [https://issues.apache.org/jira/browse/NUTCH-1949?focusedCommentId=14336272&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14336272] - Julien Nioche On March 2, 2015, 5:58 p.m., Giuseppe Tot

Re: [ANNOUNCE] New Nutch committer and PMC - Mo Omer

2015-03-23 Thread Julien Nioche
Welcome Mo! On 22 March 2015 at 19:31, Markus Jelsma wrote: > Welcome Mohammad! > > -Original message- > From: Mohammed Omer > Sent: Sunday 22nd March 2015 18:55 > To: u...@nutch.apache.org > Cc: dev@nutch.apache.org > Subject: Re: [ANNOUNCE] New Nutch committer and PMC - Mo Omer > > Hel

Re: [ANNOUNCE] New Nutch committer and PMC - Guiseppe Totaro

2015-04-25 Thread Julien Nioche
Congrats and welcome Giuseppe! On 25 April 2015 at 22:43, Giuseppe Totaro wrote: > Thanks a lot Sebastian. > I am very proud to be part of this project as committer and member of the > Nutch PMC. > > I am working on Information Retrieval at scale under the supervision of > Professor Chris Mattma

Re: [VOTE] Release Apache Nutch 1.10

2015-04-30 Thread Julien Nioche
Thanks Lewis +1 : compiled on Linux + ran a small crawl and indexed with ES j On 29 April 2015 at 22:54, Lewis John Mcgibbney wrote: > > Hi user@ & dev@,This thread is a VOTE for releasing Apache Nutch 1.10. The > release candidate comprises the following components.* A staging repository >

crawler-commons 0.6 released

2015-06-11 Thread Julien Nioche
[Apologies for cross posting]crawler-commons 0.6 is released We are glad to announce the 0.6 release of Crawler Commons. See the CHANGES.txt file included with the release for a full list of details. We suggest

Re: [DISCUSS] Release Nutch trunk 1.11

2015-08-26 Thread Julien Nioche
Hi Lewis I'd love to see https://issues.apache.org/jira/browse/NUTCH-1517 being part of 1.11. It is a separate indexing plugin which should not impact any existing code. It's been reviewed by Jorge and I'll to commit it soon unless someone objects. Thanks J. On 26 August 2015 at 03:23, Lewis Jo

Re: [DISCUSS] Release Nutch trunk 1.11

2015-08-26 Thread Julien Nioche
Done. Thanks Markus On 26 August 2015 at 13:08, Markus Jelsma wrote: > Yes Julien, please commit. I do think > https://issues.apache.org/jira/browse/NUTCH-2064 should also be included. > But i have my hands full atm. > > -Original message- > From: Julien Nioche >

Re: [ANNOUNCE] New Nutch committer and PMC - Asitang Mishra

2015-09-10 Thread Julien Nioche
Congratulations Asitang and welcome! Julien On 9 September 2015 at 23:01, Sebastian Nagel wrote: > Dear all, > > on behalf of the Nutch PMC it is my pleasure to announce > that Asitang Mishra has joined the Nutch team as committer > and PMC member. Asitang, please feel free to introduce > yours

Fwd: Job Opening at Common Crawl - Crawl Engineer / Data Scientist

2015-09-18 Thread Julien Nioche
Nutch people, Just in case you missed the announcement below. As you probably know CC use Nutch for their crawls, this is a fantastic opportunity to put your Nutch skills to great use! Julien -- Forwarded message -- From: Sara Crouse Date: 17 September 2015 at 22:51 Subject: Job

Tutorial : Index the web with AWS CloudSearch

2015-09-23 Thread Julien Nioche
Hi everyone, Just to let you know that we've just published a new tutorial on how to use Nutch (and StormCrawler) to crawl and index documents into AWS CloudSearch. This is related to the recent addition of NUTCH-1517 in the trunk codebase. The t

Webcast : Apache Nutch on EMR

2015-09-23 Thread Julien Nioche
Hi again, I have uploaded at webcast explaining how to run Nutch on AWS Elastic Map Reduce https://www.youtube.com/watch?v=v9zjcTjjjyU Please excuse the sound quality, hesitations and stuttering. I hope you find it useful nonetheless. Julien -- *Open Source Solutions for Text Engineering* h

Re: Nutch not recognizing html pages/images retrieved via php

2015-10-05 Thread Julien Nioche
Hi What happens is that parse-tika is used by default but doesn't know what to do with that mime type. You can edit parse-plugins.xml and add to map the mime type to the html parser. Obviously you'll need parse-html to be a

Re: [VOTE] Apache Nutch 1.11 Release Candidate #1

2015-10-26 Thread Julien Nioche
Chris -1 We usually release tar.gz as well as zip. More importantly we need to release the sources as well as the binary. We can't even test that it compiles OK Since you released Tika, why don't we include it before cutting 1.11? Thanks Julien On 26 October 2015 at 05:53, Mattmann, Chris A

Re: [VOTE] Release Apache Nutch 1.11 RC#2

2015-12-05 Thread Julien Nioche
+1 Thanks Lewis On 4 December 2015 at 18:03, Lewis John Mcgibbney wrote: > Hi Folks, > > A second candidate for the Nutch 1.11 release is available at: > > https://dist.apache.org/repos/dist/dev/nutch/1.11rc2/ > > The release candidate consists of zip and tar binaries as well as zip and > tar

Re: [RELEASE] Apache Nutch 1.11

2015-12-08 Thread Julien Nioche
Thanks Lewis for taking care of the release and everyone involved. Julien On 8 December 2015 at 01:34, lewis john mcgibbney wrote: > Hello Folks, > > 07 December 2015 - Nutch 1.11 Release > > The Apache Nutch PMC are pleased to announce the immediate release of > Apache Nutch v1.11, we advise a

Re: [VOTE] Moving to Git

2016-01-08 Thread Julien Nioche
+1 to move to Git Note : I don't think Dennis is on the PMC anymore Ju On 8 January 2016 at 08:46, Chris Mattmann wrote: > Hi Everyone, > > I proposed this earlier, and we said we’d wait until after the > 1.11 release. So it’s time to VOTE to move Nutch to Git. So > far, the following people h

Re: [VOTE] Release Apache Nutch 1.12

2016-06-15 Thread Julien Nioche
+1 Thanks Lewis and team! On 15 June 2016 at 06:14, lewis john mcgibbney wrote: > Hi Folks, > > A first candidate for the Nutch 1.12 release is available at: > > https://dist.apache.org/repos/dist/dev/nutch/1.12/ > > The release candidate is a zip and tar archive of the sources tag available >

ApacheCon EU Sevilla

2016-06-29 Thread Julien Nioche
Hi, Sorry for cross posting. As you are probably aware, the ApacheCon Europe, and Apache Big Data conferences will take place in Seville, Spain, November 14-18, 2016. http://events.linuxfoundation.org/events/apache-big-data-europe/ I just submitted a talk on StormCrawler

Crawler-Commons 0.7 released

2016-11-24 Thread Julien Nioche
Apologies for cross-posting The Common-Crawl project is pleased to announce its 0.7 release. https://github.com/crawler-commons/crawler-commons#24th-november-2016crawler-commons-07-released The list of changes can be found here

Re: [VOTE] Release Apache Nutch 1.13 RC#1

2017-03-29 Thread Julien Nioche
Hi Lewis +1 compiled from source and ran a small crawl in local mode. All good! Thanks Julien On 29 March 2017 at 05:20, lewis john mcgibbney wrote: > Hi Folks, > > A first candidate for the Nutch 1.13 release is available at: > > https://dist.apache.org/repos/dist/dev/nutch/1.13/ > > The r

Crawler-Commons 0.8 released

2017-06-09 Thread Julien Nioche
Apologies for cross-posting The Common-Crawl project is pleased to announce its 0.8 release. *https://github.com/crawler-commons/crawler-commons/releases/tag/crawler-commons-0.8 * If you are wondering what Crawl

Re: Establishment of Static Source Code Analysis

2017-06-16 Thread Julien Nioche
> > Russian compatriots Are we all Russian then? On 16 June 2017 at 04:29, lewis john mcgibbney wrote: > Hi Folks, > I don't know if anyone else noticed... some of our Russian compatriots > have set up a static auto bot to notify us of source code issues... > An example is as follows > https:

Re: Establishment of Static Source Code Analysis

2017-06-16 Thread Julien Nioche
mons <https://github.com/crawler-commons/crawler-commons/pull/127>. On 16 June 2017 at 08:55, Julien Nioche wrote: > Russian compatriots > > > Are we all Russian then? > > On 16 June 2017 at 04:29, lewis john mcgibbney wrote: > >> Hi Folks, >> I don&#x

Crawler-Commons 0.9 released

2017-10-31 Thread Julien Nioche
Happy Halloween! We are glad to announce the 0.9 release of Crawler-Commons. See the CHANGES.txt file included with the release for a full list of details. The main changes are the removal of DOM-based sitema

Re: [DISCUSS] Release 1.14?

2017-12-11 Thread Julien Nioche
Tika 1.17 will be released shortly, maybe it would be worth waiting a bit and integrate it first? On 8 December 2017 at 22:53, Sebastian Nagel wrote: > Hi all, > > 50+ issues fixed > https://issues.apache.org/jira/projects/NUTCH/versions/12340218 > > Of course, as always and still many open is

Re: [DISCUSS] Release 1.14?

2017-12-14 Thread Julien Nioche
ll make sure that it's included. > > Thanks, > Sebastian > > > On 12/11/2017 10:22 AM, Julien Nioche wrote: > > Tika 1.17 will be released shortly, maybe it would be worth waiting a > bit and integrate it first? > > > > On 8 December 2017 at 22

Re: [VOTE] Release Apache Nutch 1.14 RC#1

2017-12-19 Thread Julien Nioche
+1 to release, thanks Seb On 18 December 2017 at 22:12, Sebastian Nagel wrote: > Hi Folks, > > A first candidate for the Nutch 1.14 release is available at: > > https://dist.apache.org/repos/dist/dev/nutch/1.14/ > > The release candidate is a zip and tar.gz archive of the binary and > sources

Crawler-Commons 0.10 released

2018-06-07 Thread Julien Nioche
Hi We are glad to announce the 0.10 release of Crawler-Commons. See the CHANGES.txt file included with the release for a full list of details. This version contains among other things improvements to the Sit

Re: [ANNOUNCE] New Nutch committer and PMC - Tim Allison

2023-07-20 Thread Julien Nioche
What a fantastic addition to the Nutch team! Congrats to Tim On Thu, 20 Jul 2023 at 10:20, Sebastian Nagel wrote: > Dear all, > > It is my pleasure to announce that Tim Allison has joined us > as a committer and member of the Nutch PMC. > > You may already know Tim as a maintainer of and contrib

[jira] Created: (NUTCH-818) Parse-tika uses minorCodes instead of majorCodes in ParseStatus

2010-05-11 Thread Julien Nioche (JIRA)
Reporter: Julien Nioche Parse-tika uses minorCodes instead of majorCodes in ParseStatus which results in a IAOOB Exception in ParseSegment as the values are outside the range of majorCodes. This happens for instance when no parser implementation can be found for a given mimetype. -- This

[jira] Updated: (NUTCH-818) Parse-tika uses minorCodes instead of majorCodes in ParseStatus

2010-05-11 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-818: Attachment: NUTCH-818.patch Will commit shortly > Parse-tika uses minorCodes instead of majorCo

[jira] Updated: (NUTCH-818) Parse-tika uses minorCodes instead of majorCodes in ParseStatus

2010-05-11 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-818: Assignee: Julien Nioche Fix Version/s: 1.1 Patch Info: [Patch Available

[jira] Closed: (NUTCH-818) Parse-tika uses minorCodes instead of majorCodes in ParseStatus

2010-05-11 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-818. --- Resolution: Fixed Committed revision 943128 > Parse-tika uses minorCodes instead of majorCodes

[jira] Commented: (NUTCH-821) Use ivy in nutch builds

2010-05-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869052#action_12869052 ] Julien Nioche commented on NUTCH-821: - Ciao Paolo, the aim is not to publish N

[jira] Updated: (NUTCH-821) Use ivy in nutch builds

2010-05-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-821: Fix Version/s: 2.0 > Use ivy in nutch builds > --- > >

[jira] Updated: (NUTCH-824) Crawling - File Error 404 when fetching file with an hexadecimal character in the file name.

2010-05-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-824: Priority: Major (was: Blocker) Changed prio to major. The problem occurs when the files are not

[jira] Commented: (NUTCH-826) Mailing list is broken.

2010-05-24 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870528#action_12870528 ] Julien Nioche commented on NUTCH-826: - Nutch has recently become a TLP and some of

[jira] Assigned: (NUTCH-826) Mailing list is broken.

2010-05-24 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reassigned NUTCH-826: --- Assignee: Julien Nioche > Mailing list is bro

[jira] Resolved: (NUTCH-826) Mailing list is broken.

2010-05-24 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-826. - Fix Version/s: 1.1 Resolution: Fixed Committed revision 947569. The changes should be

[jira] Commented: (NUTCH-828) Fetch Filter

2010-06-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876576#action_12876576 ] Julien Nioche commented on NUTCH-828: - Shall we postpone this after the release of

[jira] Created: (NUTCH-830) ScoringFilter to restrict the crawl to the hosts/domains listed in the seeds

2010-06-23 Thread Julien Nioche (JIRA)
Issue Type: New Feature Affects Versions: 1.1 Reporter: Julien Nioche Assignee: Julien Nioche Fix For: 2.0 Attachments: NUTCH-830.patch The DomainURLFilter allows to specify the domains to consider for a crawl. This works fine but requires to edit a

[jira] Updated: (NUTCH-830) ScoringFilter to restrict the crawl to the hosts/domains listed in the seeds

2010-06-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-830: Attachment: NUTCH-830.patch > ScoringFilter to restrict the crawl to the hosts/domains listed

[jira] Updated: (NUTCH-831) Allow configuration of how fields crawled by Nutch are stored / indexed / tokenized

2010-06-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-831: Fix Version/s: 1.2 (was: 1.1) Moved to Fixed 1.2 1.1 having been released

[jira] Created: (NUTCH-834) Separate the Nutch web site from trunk

2010-06-24 Thread Julien Nioche (JIRA)
Reporter: Julien Nioche Assignee: Julien Nioche Fix For: 2.0 As discussed on dev@, it would be useful to move the -PDFBox- Nutch web site sources from .../asf/nutch/trunk to .../asf/nutch/site and to use the svnpubsub mechanism for instant deployment of site

[jira] Commented: (NUTCH-834) Separate the Nutch web site from trunk

2010-06-28 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883083#action_12883083 ] Julien Nioche commented on NUTCH-834: - The mechanism has been enabled for the exis

[jira] Commented: (NUTCH-834) Separate the Nutch web site from trunk

2010-06-28 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883149#action_12883149 ] Julien Nioche commented on NUTCH-834: - I suppose you want to keep the source of

[jira] Commented: (NUTCH-834) Separate the Nutch web site from trunk

2010-06-28 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883165#action_12883165 ] Julien Nioche commented on NUTCH-834: - The javadoc would be updated only on a

[jira] Commented: (NUTCH-834) Separate the Nutch web site from trunk

2010-06-28 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883172#action_12883172 ] Julien Nioche commented on NUTCH-834: - {quote} Something like: http://svn.apache

[jira] Commented: (NUTCH-49) Flag for generate to fetch only new pages to complement the -refetchonly flag

2010-06-29 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-49?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883492#action_12883492 ] Julien Nioche commented on NUTCH-49: Can't you do the same by implementing

[jira] Commented: (NUTCH-834) Separate the Nutch web site from trunk

2010-06-29 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883510#action_12883510 ] Julien Nioche commented on NUTCH-834: - Is the javadoc copied manuall

[jira] Commented: (NUTCH-834) Separate the Nutch web site from trunk

2010-06-29 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883544#action_12883544 ] Julien Nioche commented on NUTCH-834: - Checking it in would be fine. We'd h

[jira] Issue Comment Edited: (NUTCH-834) Separate the Nutch web site from trunk

2010-06-29 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883544#action_12883544 ] Julien Nioche edited comment on NUTCH-834 at 6/29/10 9:3

[jira] Commented: (NUTCH-834) Separate the Nutch web site from trunk

2010-06-29 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883566#action_12883566 ] Julien Nioche commented on NUTCH-834: - Committed revision 958996 I have copied

[jira] Commented: (NUTCH-834) Separate the Nutch web site from trunk

2010-06-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883841#action_12883841 ] Julien Nioche commented on NUTCH-834: - The content of the site is now taken from

[jira] Closed: (NUTCH-834) Separate the Nutch web site from trunk

2010-06-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-834. --- Resolution: Fixed Committed revision 959228. Thanks Chris for your comments and help with this

[jira] Commented: (NUTCH-650) Hbase Integration

2010-06-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883880#action_12883880 ] Julien Nioche commented on NUTCH-650: - The patch has been committed with revi

[jira] Updated: (NUTCH-836) Remove deprecated parse plugins

2010-06-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-836: Attachment: NUTCH-836.patch > Remove deprecated parse plug

[jira] Created: (NUTCH-836) Remove deprecated parse plugins

2010-06-30 Thread Julien Nioche (JIRA)
: Julien Nioche Assignee: Julien Nioche Fix For: 2.0 Attachments: NUTCH-836.patch Some of the parser plugins in 1.1 are covered by the parse-tika plugin. These plugins have been kept in 1.1 but should be removed from 2.0 where we'll rely on parse-tika a

[jira] Commented: (NUTCH-836) Remove deprecated parse plugins

2010-06-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883891#action_12883891 ] Julien Nioche commented on NUTCH-836: - Actually creative-commons + languageidenti

[jira] Updated: (NUTCH-836) Remove deprecated parse plugins

2010-06-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-836: Description: Some of the parser plugins in 1.1 are covered by the parse-tika plugin. These plugins

[jira] Created: (NUTCH-837) Remove search servers and Lucene dependencies

2010-06-30 Thread Julien Nioche (JIRA)
Affects Versions: 1.1 Reporter: Julien Nioche Fix For: 2.0 One of the main aspects of 2.0 is the delegation of the indexing and search to external resources like SOLR. We can simplify the code a lot by getting rid of the : * search servers * indexing and analysis with

[jira] Updated: (NUTCH-836) Remove deprecated parse plugins

2010-06-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-836: Attachment: (was: NUTCH-836.patch) > Remove deprecated parse plug

[jira] Updated: (NUTCH-836) Remove deprecated parse plugins

2010-06-30 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-836: Attachment: NUTCH-836-2.patch New patch which fixes the issues mentioned earlier

[jira] Commented: (NUTCH-836) Remove deprecated parse plugins

2010-07-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884253#action_12884253 ] Julien Nioche commented on NUTCH-836: - {quote} * do we still need lib-neko

[jira] Commented: (NUTCH-835) document deduplication (exact duplicates) failed using MD5Signature

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884624#action_12884624 ] Julien Nioche commented on NUTCH-835: - This patch has been marked for 1.2 but has

[jira] Created: (NUTCH-840) Port tests from parse-html to parse-tika

2010-07-02 Thread Julien Nioche (JIRA)
Reporter: Julien Nioche Assignee: Julien Nioche Fix For: 2.0 We don't have test for HTML in parse-tika so I'll copy them from the old parse-html plugin -- This message is automatically generated by JIRA. - You can reply to this email to add a comm

[jira] Closed: (NUTCH-836) Remove deprecated parse plugins

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-836. --- Resolution: Fixed Committed revision 959948. Thanks Andrzej for reviewing it > Remove depreca

[jira] Updated: (NUTCH-840) Port tests from parse-html to parse-tika

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-840: Attachment: NUTCH-840.patch Patch which adds the HTML tests to the Tika Parser The tests currently

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884671#action_12884671 ] Julien Nioche commented on NUTCH-837: - I think we can also get rid of : * docs/ *

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884700#action_12884700 ] Julien Nioche commented on NUTCH-837: - Hi Chris, My position on this is tha

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884715#action_12884715 ] Julien Nioche commented on NUTCH-837: - Thanks for your comments Chris {quote}

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884734#action_12884734 ] Julien Nioche commented on NUTCH-837: - :-) > Remove search servers and

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884739#action_12884739 ] Julien Nioche commented on NUTCH-837: - Comments on the latest p

[jira] Updated: (NUTCH-821) Use ivy in nutch builds

2010-07-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-821: Attachment: NUTCH-821.patch Adds IVY support for dependencies The lib/. dir is maintained and will

[jira] Resolved: (NUTCH-791) External links for published javadocs are partially broken

2010-07-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-791. - Fix Version/s: 1.1 Resolution: Duplicate Duplicates 790? > External links for publis

[jira] Commented: (NUTCH-821) Use ivy in nutch builds

2010-07-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885207#action_12885207 ] Julien Nioche commented on NUTCH-821: - {QUOTE} I think this patch refers to some p

[jira] Commented: (NUTCH-821) Use ivy in nutch builds

2010-07-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885244#action_12885244 ] Julien Nioche commented on NUTCH-821: - I found [http://ant.apache.org/ivy/ivyde/] w

[jira] Commented: (NUTCH-696) Timeout for Parser

2010-07-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885260#action_12885260 ] Julien Nioche commented on NUTCH-696: - +1 : this is definitely useful. Hopefully

[jira] Issue Comment Edited: (NUTCH-696) Timeout for Parser

2010-07-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885260#action_12885260 ] Julien Nioche edited comment on NUTCH-696 at 7/5/10 11:1

[jira] Commented: (NUTCH-821) Use ivy in nutch builds

2010-07-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885463#action_12885463 ] Julien Nioche commented on NUTCH-821: - @Chris : isn't this restricted to the

[jira] Commented: (NUTCH-821) Use ivy in nutch builds

2010-07-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885896#action_12885896 ] Julien Nioche commented on NUTCH-821: - Committed revision 961306 and 961318 Slig

[jira] Closed: (NUTCH-821) Use ivy in nutch builds

2010-07-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-821. --- Resolution: Fixed > Use ivy in nutch builds > --- > >

[jira] Commented: (NUTCH-843) Separate the build and runtime environments

2010-07-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886315#action_12886315 ] Julien Nioche commented on NUTCH-843: - I really like this. What shall we do with

[jira] Commented: (NUTCH-845) Native hadoop libs not available through maven

2010-07-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886321#action_12886321 ] Julien Nioche commented on NUTCH-845: - +1 There is always a possibility to use the

[jira] Commented: (NUTCH-843) Separate the build and runtime environments

2010-07-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886323#action_12886323 ] Julien Nioche commented on NUTCH-843: - OK - for some reason I thought we could

[jira] Created: (NUTCH-846) Remove Hadoop related scripts in /bin

2010-07-09 Thread Julien Nioche (JIRA)
Remove Hadoop related scripts in /bin - Key: NUTCH-846 URL: https://issues.apache.org/jira/browse/NUTCH-846 Project: Nutch Issue Type: Task Reporter: Julien Nioche Fix For: 2.0

[jira] Closed: (NUTCH-846) Remove Hadoop related scripts in /bin

2010-07-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-846. --- Resolution: Not A Problem Actually the /bin directory has already been removed as part of : http

[jira] Created: (NUTCH-847) Wrong version of SOLR in Ivy.xml

2010-07-09 Thread Julien Nioche (JIRA)
Wrong version of SOLR in Ivy.xml Key: NUTCH-847 URL: https://issues.apache.org/jira/browse/NUTCH-847 Project: Nutch Issue Type: Bug Components: indexer Reporter: Julien Nioche

[jira] Closed: (NUTCH-847) Wrong version of SOLR in Ivy.xml

2010-07-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-847. --- Resolution: Fixed Committed revision 962497. > Wrong version of SOLR in Ivy.

[jira] Commented: (NUTCH-847) Wrong version of SOLR in Ivy.xml

2010-07-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886734#action_12886734 ] Julien Nioche commented on NUTCH-847: - Thanks. I'll commit the change > W

[jira] Commented: (NUTCH-847) Wrong version of SOLR in Ivy.xml

2010-07-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886736#action_12886736 ] Julien Nioche commented on NUTCH-847: - Nutch 1.1 came with the version 0.9.4 for

<    1   2   3   4   5   6   7   8   9   10   >