[jira] [Created] (NUTCH-1999) Add http://nutch.apache.org/robots.txt

2015-04-23 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1999: Summary: Add http://nutch.apache.org/robots.txt Key: NUTCH-1999 URL: https://issues.apache.org/jira/browse/NUTCH-1999 Project: Nutch Issue Type: Improvement

[jira] [Assigned] (NUTCH-1999) Add http://nutch.apache.org/robots.txt

2015-04-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reassigned NUTCH-1999: Assignee: Julien Nioche > Add http://nutch.apache.org/robots.txt >

[jira] [Created] (NUTCH-2000) Link inversion fails with .locked already exists.

2015-04-23 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-2000: Summary: Link inversion fails with .locked already exists. Key: NUTCH-2000 URL: https://issues.apache.org/jira/browse/NUTCH-2000 Project: Nutch Issue Type: B

[jira] [Commented] (NUTCH-1985) Adding a main() method to the MimeTypeIndexingFilter

2015-04-23 Thread Jorge Luis Betancourt Gonzalez (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509167#comment-14509167 ] Jorge Luis Betancourt Gonzalez commented on NUTCH-1985: --- Should we c

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509393#comment-14509393 ] Lewis John McGibbney commented on NUTCH-1994: - Anyone to review? I can roll a

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509492#comment-14509492 ] Tyler Palsulich commented on NUTCH-1994: Applied and tested both patches, both loo

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509501#comment-14509501 ] Lewis John McGibbney commented on NUTCH-1994: - Would like to commit by EoB tod

[PROPOSE] Kick off Apache Nutch 1.8 by EoB Friday 04232015

2015-04-23 Thread Lewis John Mcgibbney
Hi Folks, Does anyone have an issue with the above proposal? Thanks Lewis -- *Lewis*

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509514#comment-14509514 ] Tyler Palsulich commented on NUTCH-1994: Happy to help, [~lewismc]! > Upgrade to

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509522#comment-14509522 ] Lewis John McGibbney commented on NUTCH-1994: - Dynamite [~tpalsulich] I'll get

Re: [PROPOSE] Kick off Apache Nutch 1.8 by EoB Friday 04232015

2015-04-23 Thread Mattmann, Chris A (3980)
s/1.8/1.10/ right? If so +1! ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.

Unsubscribe

2015-04-23 Thread Mengxian Li
Hi, I want to unsubscribe the email list. Best, Mengxian

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509678#comment-14509678 ] Sebastian Nagel commented on NUTCH-1994: +1 > Upgrade to Apache Tika 1.8 > --

Unsubscribe

2015-04-23 Thread Zhaohui Zhang
Hi, I want to unsubscribe the email list. Best, Zhaohui -- Zhaohui Zhang Dept. of Chemical Engineering, University of Southern California Addr: 2611 Portland Street, Los Angeles, CA, USA 90007 Mobile:(+1)213-880-8321 Email: zhaoh...@usc.edu; happy...@gmail.com; zhaohuizh

[jira] [Resolved] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1994. - Resolution: Fixed Committed revision 1675723 in trunk Committed revision 1675724 i

[jira] [Commented] (NUTCH-1985) Adding a main() method to the MimeTypeIndexingFilter

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509873#comment-14509873 ] Lewis John McGibbney commented on NUTCH-1985: - [~jorgelbg] +1 please commit ag

[jira] [Updated] (NUTCH-2000) Link inversion fails with .locked already exists.

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2000: Fix Version/s: (was: 1.10) 1.11 > Link inversion fails with .

[jira] [Updated] (NUTCH-1947) Overhaul o.a.n.parse.OutlinkExtractor.java

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1947: Fix Version/s: (was: 1.10) 1.11 > Overhaul o.a.n.parse.Outlin

[jira] [Commented] (NUTCH-1963) CommonsCrawlDataDumper is too long ( > 100 bytes) when -gzip option invoked

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509876#comment-14509876 ] Lewis John McGibbney commented on NUTCH-1963: - [~gostep] is this issue address

[jira] [Commented] (NUTCH-1969) URL Normalizer properly handling slashes

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509880#comment-14509880 ] Lewis John McGibbney commented on NUTCH-1969: - +1 for commit [~markus.jel...@o

[jira] [Updated] (NUTCH-1958) Remove scoring-opic from nutch-default.xml

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1958: Fix Version/s: (was: 1.10) 1.11 > Remove scoring-opic from nu

Unsubscribe

2015-04-23 Thread Zhaohui Zhang
Hi, I want to unsubscribe the email list. Best, Zhaohui -- Zhaohui Zhang PhD Student at University of Southern California Mobile: (213)-880-8321 Email: zhaoh...@usc.edu

[jira] [Updated] (NUTCH-2000) Link inversion fails with .locked already exists.

2015-04-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-2000: - Priority: Blocker (was: Major) > Link inversion fails with .locked already exists. >

[jira] [Updated] (NUTCH-2000) Link inversion fails with .locked already exists.

2015-04-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-2000: - Fix Version/s: (was: 1.11) 1.10 > Link inversion fails with .locked already

[jira] [Commented] (NUTCH-2000) Link inversion fails with .locked already exists.

2015-04-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509898#comment-14509898 ] Julien Nioche commented on NUTCH-2000: -- [~lewismc] reverted to 1.10 as this is a bloc

Build failed in Jenkins: Nutch-trunk #3083

2015-04-23 Thread Apache Jenkins Server
See Changes: [lewismc] NUTCH-1994 Upgrade to Apache Tika 1.8 -- [...truncated 5538 lines...] [echo] Testing plugin: urlfilter-validator [junit] WARNING: multiple versions of ant detected in

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509899#comment-14509899 ] Hudson commented on NUTCH-1994: --- FAILURE: Integrated in Nutch-trunk #3083 (See [https://bui

[jira] [Commented] (NUTCH-2000) Link inversion fails with .locked already exists.

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509916#comment-14509916 ] Lewis John McGibbney commented on NUTCH-2000: - ACK > Link inversion fails wit

[jira] [Commented] (NUTCH-2000) Link inversion fails with .locked already exists.

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509920#comment-14509920 ] Lewis John McGibbney commented on NUTCH-2000: - Julien... I wonder if the 2nd U

[jira] [Created] (NUTCH-2001) SubCollection Field Name incorrect in nutch-default.xml

2015-04-23 Thread Jeff Cocking (JIRA)
Jeff Cocking created NUTCH-2001: --- Summary: SubCollection Field Name incorrect in nutch-default.xml Key: NUTCH-2001 URL: https://issues.apache.org/jira/browse/NUTCH-2001 Project: Nutch Issue Typ

Re: Unsubscribe

2015-04-23 Thread Michael Joyce
Email dev-unsubscr...@nutch.apache.org You unsub the same way you subbed. It's just a different email. -- Jimmy On Thu, Apr 23, 2015 at 1:23 PM, Zhaohui Zhang wrote: > Hi, > > I want to unsubscribe the email list. > > Best, > Zhaohui > > > -- > Zhaohui Zhang > Dept. of Chemical Engineering,

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509995#comment-14509995 ] Hudson commented on NUTCH-1994: --- SUCCESS: Integrated in Nutch-nutchgora #1412 (See [https:/

[jira] [Updated] (NUTCH-2001) SubCollection Field Name incorrect in nutch-default.xml

2015-04-23 Thread Jeff Cocking (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Cocking updated NUTCH-2001: Attachment: NUTCH-2001-1.x.patch > SubCollection Field Name incorrect in nutch-default.xml > ---

[jira] [Commented] (NUTCH-2001) SubCollection Field Name incorrect in nutch-default.xml

2015-04-23 Thread Jeff Cocking (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509997#comment-14509997 ] Jeff Cocking commented on NUTCH-2001: - Attached is a patch I created from a clean down

[jira] [Commented] (NUTCH-1963) CommonsCrawlDataDumper is too long ( > 100 bytes) when -gzip option invoked

2015-04-23 Thread Giuseppe Totaro (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510062#comment-14510062 ] Giuseppe Totaro commented on NUTCH-1963: Hi [~lewismc]. Yes, [NUTCH-1959|https://

[jira] [Resolved] (NUTCH-1963) CommonsCrawlDataDumper is too long ( > 100 bytes) when -gzip option invoked

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1963. - Resolution: Fixed Assignee: Giuseppe Totaro Addressed within NUTCH-1959 Than

[jira] [Commented] (NUTCH-1973) Job Administration end point for the REST service

2015-04-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510097#comment-14510097 ] Lewis John McGibbney commented on NUTCH-1973: - This commit accidently removed

[jira] [Commented] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510167#comment-14510167 ] Hudson commented on NUTCH-1927: --- FAILURE: Integrated in Nutch-trunk #3084 (See [https://bui

Build failed in Jenkins: Nutch-trunk #3084

2015-04-23 Thread Apache Jenkins Server
See Changes: [lewismc] Add back in NUTCH-1927 property to nutch-default as revoved during commit @1675022 -- [...truncated 5373 lines...] [junit] WARNING: multiple versions of ant detected in pa

Build failed in Jenkins: Nutch-trunk #3085

2015-04-23 Thread Apache Jenkins Server
See -- [...truncated 5536 lines...] [echo] Testing plugin: urlfilter-validator [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/home/jenkins/tools

[jira] [Commented] (NUTCH-1985) Adding a main() method to the MimeTypeIndexingFilter

2015-04-23 Thread Jorge Luis Betancourt Gonzalez (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510341#comment-14510341 ] Jorge Luis Betancourt Gonzalez commented on NUTCH-1985: --- Committed r

Re: [MASSMAIL]Re: [PROPOSE] Kick off Apache Nutch 1.8 by EoB Friday 04232015

2015-04-23 Thread Jorge Luis Betancourt González
+1 - Original Message - From: "Chris A Mattmann (3980)" To: dev@nutch.apache.org Sent: Thursday, April 23, 2015 2:16:09 PM Subject: [MASSMAIL]Re: [PROPOSE] Kick off Apache Nutch 1.8 by EoB Friday 04232015 s/1.8/1.10/ right? If so +1! +++

[jira] [Resolved] (NUTCH-1985) Adding a main() method to the MimeTypeIndexingFilter

2015-04-23 Thread Jorge Luis Betancourt Gonzalez (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Luis Betancourt Gonzalez resolved NUTCH-1985. --- Resolution: Fixed > Adding a main() method to the MimeTypeInde

[jira] [Commented] (NUTCH-1997) Add CBOR "magic header" to CommonCrawlDataDumper output

2015-04-23 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510380#comment-14510380 ] Luke sh commented on NUTCH-1997: Notes: The attached cbor file contains both magic bytes f

[jira] [Issue Comment Deleted] (NUTCH-1997) Add CBOR "magic header" to CommonCrawlDataDumper output

2015-04-23 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated NUTCH-1997: --- Comment: was deleted (was: Notes: The attached cbor file contains both magic bytes for type xhtml and type cbo

Build failed in Jenkins: Nutch-trunk #3086

2015-04-23 Thread Apache Jenkins Server
See Changes: [jorgelbg] NUTCH-1985 Adding a main() method to the MimeTypeIndexingFilter -- [...truncated 5373 lines...] copy-generated-lib: test: [echo] Testing plugin: urlfilter-validator

[jira] [Commented] (NUTCH-1985) Adding a main() method to the MimeTypeIndexingFilter

2015-04-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510388#comment-14510388 ] Hudson commented on NUTCH-1985: --- FAILURE: Integrated in Nutch-trunk #3086 (See [https://bui

Build failed in Jenkins: Nutch-trunk #3087

2015-04-23 Thread Apache Jenkins Server
See -- [...truncated 5611 lines...] test: [echo] Testing plugin: urlfilter-validator [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/home/jenkins