[jira] [Commented] (NUTCH-2027) seed list REST endpoint for Nutch 1.10

2015-06-03 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571377#comment-14571377 ] Chris A. Mattmann commented on NUTCH-2027: -- Thanks Astiang, I will look at this!

[jira] [Assigned] (NUTCH-2027) seed list REST endpoint for Nutch 1.10

2015-06-03 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2027: Assignee: Chris A. Mattmann > seed list REST endpoint for Nutch 1.10 >

[jira] [Updated] (NUTCH-2027) seed list REST endpoint for Nutch 1.10

2015-06-03 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2027: - Labels: memex rest_api (was: rest_api) > seed list REST endpoint for Nutch 1.10 > ---

[jira] [Updated] (NUTCH-2027) seed list REST endpoint for Nutch 1.10

2015-06-03 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2027: - Fix Version/s: 1.11 > seed list REST endpoint for Nutch 1.10 > ---

[jira] [Resolved] (NUTCH-2031) Create Admin End point for Nutch 1.x REST service

2015-06-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2031. -- Resolution: Fixed OK PR merged! Thanks Sujen! {noformat} [chipotle:~/tmp/nutch-trunk] m

[jira] [Updated] (NUTCH-2031) Create Admin End point for Nutch 1.x REST service

2015-06-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2031: - Labels: memex (was: ) > Create Admin End point for Nutch 1.x REST service > -

[jira] [Commented] (NUTCH-2031) Create Admin End point for Nutch 1.x REST service

2015-06-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570143#comment-14570143 ] Chris A. Mattmann commented on NUTCH-2031: -- Thanks Sujen. I'll give this a spin.

[jira] [Work started] (NUTCH-2031) Create Admin End point for Nutch 1.x REST service

2015-06-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2031 started by Chris A. Mattmann. > Create Admin End point for Nutch 1.x REST service > ---

[jira] [Assigned] (NUTCH-2031) Create Admin End point for Nutch 1.x REST service

2015-06-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2031: Assignee: Chris A. Mattmann > Create Admin End point for Nutch 1.x REST service > -

[jira] [Resolved] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-06-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2015. -- Resolution: Fixed Committed, and formatted. {noformat} commit -m "- fix for NUTCH-2015

[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-06-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568497#comment-14568497 ] Chris A. Mattmann commented on NUTCH-2015: -- Thanks [~wastl-nagel] committing now!

[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565606#comment-14565606 ] Chris A. Mattmann commented on NUTCH-2015: -- Hi [~asitang] checking in - where are

[jira] [Commented] (NUTCH-2021) Use protocol-selenium to Capture Screenshots of the Page as it is Fetched

2015-05-28 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563028#comment-14563028 ] Chris A. Mattmann commented on NUTCH-2021: -- +1 for the patch as-is. Suggested im

[jira] [Commented] (NUTCH-2021) Use protocol-selenium to Capture Screenshots of the Page as it is Fetched

2015-05-28 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562999#comment-14562999 ] Chris A. Mattmann commented on NUTCH-2021: -- +1 for the patch as-is. Suggested im

[jira] [Commented] (NUTCH-1995) Add support for wildcard to http.robot.rules.whitelist

2015-05-27 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560876#comment-14560876 ] Chris A. Mattmann commented on NUTCH-1995: -- thanks Seb. Giuseppe, can you update?

[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-27 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560874#comment-14560874 ] Chris A. Mattmann commented on NUTCH-2015: -- pinging again here [~asitang] and [~s

[jira] [Commented] (NUTCH-1995) Add support for wildcard to http.robot.rules.whitelist

2015-05-26 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560372#comment-14560372 ] Chris A. Mattmann commented on NUTCH-1995: -- ahh gotcha. OK so the errors don't ha

[jira] [Commented] (NUTCH-1995) Add support for wildcard to http.robot.rules.whitelist

2015-05-26 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559404#comment-14559404 ] Chris A. Mattmann commented on NUTCH-1995: -- [~wastl-nagel] - are you +1 to go on

[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-23 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557467#comment-14557467 ] Chris A. Mattmann commented on NUTCH-2015: -- Ping Sujen, Asitang, did you guys get

[jira] [Commented] (NUTCH-1995) Add support for wildcard to http.robot.rules.whitelist

2015-05-21 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555304#comment-14555304 ] Chris A. Mattmann commented on NUTCH-1995: -- I am +1 on this patch. Will wait for

[jira] [Commented] (NUTCH-1995) Add support for wildcard to http.robot.rules.whitelist

2015-05-20 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553264#comment-14553264 ] Chris A. Mattmann commented on NUTCH-1995: -- Hey Seb, yeah I don't think we should

[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551632#comment-14551632 ] Chris A. Mattmann commented on NUTCH-2015: -- Sujen can you also please update the

[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551447#comment-14551447 ] Chris A. Mattmann commented on NUTCH-2015: -- [~sujenshah] did you make the changes

[jira] [Assigned] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2015: Assignee: Chris A. Mattmann > Make FetchNodeDb optional (off by default) if NutchSe

[jira] [Work started] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2015 started by Chris A. Mattmann. > Make FetchNodeDb optional (off by default) if NutchServer is not used >

[jira] [Commented] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547395#comment-14547395 ] Chris A. Mattmann commented on NUTCH-2011: -- Done! {noformat} [chipotle:~/tmp/nut

[jira] [Commented] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547394#comment-14547394 ] Chris A. Mattmann commented on NUTCH-2011: -- Just tested re-applying the patch, al

[jira] [Commented] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547376#comment-14547376 ] Chris A. Mattmann commented on NUTCH-2011: -- Yep changes seem to have been rolled

[jira] [Commented] (NUTCH-1995) Add support for wildcard to http.robot.rules.whitelist

2015-05-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547340#comment-14547340 ] Chris A. Mattmann commented on NUTCH-1995: -- Seb, I realize that this isn't should

[jira] [Commented] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546324#comment-14546324 ] Chris A. Mattmann commented on NUTCH-2011: -- All great points [~wastl-nagel]. [~su

[jira] [Resolved] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2011. -- Resolution: Fixed Committed! Thank you [~sujenshah] {noformat} [mattmann-0420740:~/tmp

[jira] [Commented] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545890#comment-14545890 ] Chris A. Mattmann commented on NUTCH-2011: -- all tests pass! {noformat} copy-gener

[jira] [Commented] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545771#comment-14545771 ] Chris A. Mattmann commented on NUTCH-2011: -- put some comments on the PR, they hav

[jira] [Updated] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2011: - Fix Version/s: 1.11 > Endpoint to support realtime JSON output from the fetcher >

[jira] [Assigned] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2011: Assignee: Chris A. Mattmann > Endpoint to support realtime JSON output from the fet

[jira] [Updated] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2011: - Component/s: REST_api fetcher > Endpoint to support realtime JSON output

[jira] [Updated] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2011: - Labels: memex (was: ) > Endpoint to support realtime JSON output from the fetcher > -

[jira] [Work started] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2011 started by Chris A. Mattmann. > Endpoint to support realtime JSON output from the fetcher > ---

[jira] [Commented] (NUTCH-1998) Add support for user-defined file extension to CommonCrawlDataDumper

2015-05-11 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538568#comment-14538568 ] Chris A. Mattmann commented on NUTCH-1998: -- yay [~wastl-nagel] to save the day th

[jira] [Work started] (NUTCH-1995) Add support for wildcard to http.robot.rules.whitelist

2015-05-09 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1995 started by Chris A. Mattmann. > Add support for wildcard to http.robot.rules.whitelist > --

[jira] [Updated] (NUTCH-1995) Add support for wildcard to http.robot.rules.whitelist

2015-05-09 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1995: - Fix Version/s: 1.11 > Add support for wildcard to http.robot.rules.whitelist > ---

[jira] [Updated] (NUTCH-1995) Add support for wildcard to http.robot.rules.whitelist

2015-05-09 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1995: - Labels: memex (was: ) > Add support for wildcard to http.robot.rules.whitelist >

[jira] [Assigned] (NUTCH-1995) Add support for wildcard to http.robot.rules.whitelist

2015-05-09 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-1995: Assignee: Chris A. Mattmann > Add support for wildcard to http.robot.rules.whitelis

[jira] [Resolved] (NUTCH-1998) Add support for user-defined file extension to CommonCrawlDataDumper

2015-05-09 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-1998. -- Resolution: Fixed Committed thanks [~gostep]! {noformat} [chipotle:~/tmp/nutch-trunk] m

[jira] [Assigned] (NUTCH-1998) Add support for user-defined file extension to CommonCrawlDataDumper

2015-05-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-1998: Assignee: Chris A. Mattmann > Add support for user-defined file extension to Common

[jira] [Updated] (NUTCH-1998) Add support for user-defined file extension to CommonCrawlDataDumper

2015-05-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1998: - Fix Version/s: 1.11 > Add support for user-defined file extension to CommonCrawlDataDumper

[jira] [Work started] (NUTCH-1998) Add support for user-defined file extension to CommonCrawlDataDumper

2015-05-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1998 started by Chris A. Mattmann. > Add support for user-defined file extension to CommonCrawlDataDumper >

[jira] [Commented] (NUTCH-1969) URL Normalizer properly handling slashes

2015-04-27 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514187#comment-14514187 ] Chris A. Mattmann commented on NUTCH-1969: -- Thanks Markus, seen. Someone marked i

[jira] [Resolved] (NUTCH-1969) URL Normalizer properly handling slashes

2015-04-26 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-1969. -- Resolution: Fixed - thanks [~markus.jel...@openindex.io] I committed this! {noformat} [

[jira] [Work started] (NUTCH-1969) URL Normalizer properly handling slashes

2015-04-26 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1969 started by Chris A. Mattmann. > URL Normalizer properly handling slashes >

[jira] [Assigned] (NUTCH-1969) URL Normalizer properly handling slashes

2015-04-26 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-1969: Assignee: Chris A. Mattmann (was: Markus Jelsma) > URL Normalizer properly handlin

[jira] [Work started] (NUTCH-2001) SubCollection Field Name incorrect in nutch-default.xml

2015-04-26 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2001 started by Chris A. Mattmann. > SubCollection Field Name incorrect in nutch-default.xml > -

[jira] [Resolved] (NUTCH-2001) SubCollection Field Name incorrect in nutch-default.xml

2015-04-26 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2001. -- Resolution: Fixed Thanks [~jcocking] I've applied your patch! {noformat} [chipotle:~/tm

[jira] [Assigned] (NUTCH-2001) SubCollection Field Name incorrect in nutch-default.xml

2015-04-26 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2001: Assignee: Chris A. Mattmann > SubCollection Field Name incorrect in nutch-default.x

[jira] [Updated] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1994: - Attachment: NUTCH-1994-Mattmann.042515.patch.txt > Upgrade to Apache Tika 1.8 > --

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512910#comment-14512910 ] Chris A. Mattmann commented on NUTCH-1994: -- OK, so here's some more info. I print

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512858#comment-14512858 ] Chris A. Mattmann commented on NUTCH-1994: -- Hey [~jorgelbg] I thought it was NUTC

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512846#comment-14512846 ] Chris A. Mattmann commented on NUTCH-1994: -- https://builds.apache.org/job/Nutch-t

[jira] [Commented] (NUTCH-1994) Upgrade to Apache Tika 1.8

2015-04-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512845#comment-14512845 ] Chris A. Mattmann commented on NUTCH-1994: -- So, for whatever reason, this is brea

[jira] [Commented] (NUTCH-1991) Tika mime detection not using Nutch supplied tika-mimetypes.xml for content based detection

2015-04-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512844#comment-14512844 ] Chris A. Mattmann commented on NUTCH-1991: -- This was a red herring and not the ca

[jira] [Commented] (NUTCH-1991) Tika mime detection not using Nutch supplied tika-mimetypes.xml for content based detection

2015-04-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512691#comment-14512691 ] Chris A. Mattmann commented on NUTCH-1991: -- So, the problem here is that tika.det

[jira] [Commented] (NUTCH-1991) Tika mime detection not using Nutch supplied tika-mimetypes.xml for content based detection

2015-04-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512627#comment-14512627 ] Chris A. Mattmann commented on NUTCH-1991: -- Darn, so this seems to have broke the

[jira] [Resolved] (NUTCH-1997) Add CBOR "magic header" to CommonCrawlDataDumper output

2015-04-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-1997. -- Resolution: Fixed thanks [~gostep] and [~Lukeliush]! {noformat} [chipotle:~/tmp/nutch-1

[jira] [Updated] (NUTCH-1997) Add CBOR "magic header" to CommonCrawlDataDumper output

2015-04-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1997: - Fix Version/s: 1.10 > Add CBOR "magic header" to CommonCrawlDataDumper output > --

[jira] [Assigned] (NUTCH-1997) Add CBOR "magic header" to CommonCrawlDataDumper output

2015-04-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-1997: Assignee: Chris A. Mattmann > Add CBOR "magic header" to CommonCrawlDataDumper outp

[jira] [Work started] (NUTCH-1997) Add CBOR "magic header" to CommonCrawlDataDumper output

2015-04-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1997 started by Chris A. Mattmann. > Add CBOR "magic header" to CommonCrawlDataDumper output > -

[jira] [Resolved] (NUTCH-1991) Tika mime detection not using Nutch supplied tika-mimetypes.xml for content based detection

2015-04-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-1991. -- Resolution: Fixed Fix Version/s: 1.10 Committed! {noformat} [chipotle:~/tmp/nutc

[jira] [Commented] (NUTCH-1991) Tika mime detection not using Nutch supplied tika-mimetypes.xml for content based detection

2015-04-22 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507903#comment-14507903 ] Chris A. Mattmann commented on NUTCH-1991: -- +1 will commit this shortly, thank yo

[jira] [Work started] (NUTCH-1991) Tika mime detection not using Nutch supplied tika-mimetypes.xml for content based detection

2015-04-22 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1991 started by Chris A. Mattmann. > Tika mime detection not using Nutch supplied tika-mimetypes.xml for con

[jira] [Commented] (NUTCH-1991) Tika mime detection not using Nutch supplied tika-mimetypes.xml for content based detection

2015-04-22 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507590#comment-14507590 ] Chris A. Mattmann commented on NUTCH-1991: -- will try this out today. Thanks Iain!

[jira] [Assigned] (NUTCH-1991) Tika mime detection not using Nutch supplied tika-mimetypes.xml for content based detection

2015-04-22 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-1991: Assignee: Chris A. Mattmann > Tika mime detection not using Nutch supplied tika-mim

[jira] [Commented] (NUTCH-1973) Job Administration end point for the REST service

2015-04-21 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506362#comment-14506362 ] Chris A. Mattmann commented on NUTCH-1973: -- More finishing touches, compiles and

[jira] [Commented] (NUTCH-1973) Job Administration end point for the REST service

2015-04-21 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506352#comment-14506352 ] Chris A. Mattmann commented on NUTCH-1973: -- [~sujenshah] > Job Administration en

[jira] [Commented] (NUTCH-1973) Job Administration end point for the REST service

2015-04-21 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506351#comment-14506351 ] Chris A. Mattmann commented on NUTCH-1973: -- Sujen it seems like there are two Job

[jira] [Commented] (NUTCH-1973) Job Administration end point for the REST service

2015-04-21 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506347#comment-14506347 ] Chris A. Mattmann commented on NUTCH-1973: -- So I pulled the missing files out of

[jira] [Commented] (NUTCH-1973) Job Administration end point for the REST service

2015-04-21 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506339#comment-14506339 ] Chris A. Mattmann commented on NUTCH-1973: -- Sujen you missed adding JobConfig and

[jira] [Resolved] (NUTCH-1973) Job Administration end point for the REST service

2015-04-21 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-1973. -- Resolution: Fixed Fix Version/s: 1.10 latest patch works! Thanks [~sujenshah]!

[jira] [Work started] (NUTCH-1993) Nutch does not use backup parsers

2015-04-21 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1993 started by Chris A. Mattmann. > Nutch does not use backup parsers > - >

[jira] [Assigned] (NUTCH-1993) Nutch does not use backup parsers

2015-04-21 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-1993: Assignee: Chris A. Mattmann > Nutch does not use backup parsers > -

[jira] [Commented] (NUTCH-1987) Make bin/crawl indexer agnostic

2015-04-20 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504190#comment-14504190 ] Chris A. Mattmann commented on NUTCH-1987: -- Thanks Mike, this looks good to me. I

[jira] [Resolved] (NUTCH-1987) Make bin/crawl indexer agnostic

2015-04-20 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-1987. -- Resolution: Fixed Thanks [~jo...@apache.org] Appreciate it! Thanks Seb for the review!

[jira] [Work started] (NUTCH-1987) Make bin/crawl indexer agnostic

2015-04-20 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1987 started by Chris A. Mattmann. > Make bin/crawl indexer agnostic > --- > >

[jira] [Assigned] (NUTCH-1987) Make bin/crawl indexer agnostic

2015-04-20 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-1987: Assignee: Chris A. Mattmann > Make bin/crawl indexer agnostic > ---

[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-04-20 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503882#comment-14503882 ] Chris A. Mattmann commented on NUTCH-1934: -- well my point is on this - you can ke

[jira] [Commented] (NUTCH-1934) Refactor Fetcher in trunk

2015-04-20 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503795#comment-14503795 ] Chris A. Mattmann commented on NUTCH-1934: -- +1 to commit if it applies cleanly an

[jira] [Commented] (NUTCH-1697) SegmentMerger to implement Tool

2015-04-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502188#comment-14502188 ] Chris A. Mattmann commented on NUTCH-1697: -- +1 > SegmentMerger to implement Tool

[jira] [Commented] (NUTCH-1992) Port whitelist from NUTCH-1927 to 2.x

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501484#comment-14501484 ] Chris A. Mattmann commented on NUTCH-1992: -- If there are any differences in confi

[jira] [Commented] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501482#comment-14501482 ] Chris A. Mattmann commented on NUTCH-1927: -- Updated the documentation page for th

[jira] [Updated] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1854: - Fix Version/s: (was: 1.11) 1.10 > ./bin/crawl fails with a parsing

[jira] [Updated] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1854: - Assignee: Sebastian Nagel (was: Lewis John McGibbney) > ./bin/crawl fails with a parsing

[jira] [Updated] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1854: - Labels: memex (was: ) > ./bin/crawl fails with a parsing fetcher > --

[jira] [Commented] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501463#comment-14501463 ] Chris A. Mattmann commented on NUTCH-1854: -- awesome work [~asitang] - [~wastl-nag

[jira] [Commented] (NUTCH-1987) Make bin/crawl indexer agnostic

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501455#comment-14501455 ] Chris A. Mattmann commented on NUTCH-1987: -- hey Mike can you update per Seb's com

[jira] [Created] (NUTCH-1992) Port whitelist from NUTCH-1927 to 2.x

2015-04-18 Thread Chris A. Mattmann (JIRA)
Chris A. Mattmann created NUTCH-1992: Summary: Port whitelist from NUTCH-1927 to 2.x Key: NUTCH-1992 URL: https://issues.apache.org/jira/browse/NUTCH-1992 Project: Nutch Issue Type: New

[jira] [Resolved] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-1927. -- Resolution: Fixed opened up NUTCH-1992 for 2.x, can close this out now. Thanks Seb! > C

[jira] [Updated] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1989: - Affects Version/s: (was: 1.10) > Handling invalid URLs in CommonCrawlDataDumper >

[jira] [Resolved] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-1989. -- Resolution: Fixed Committed thanks [~totaro]! {noformat} [chipotle:~/tmp/nutch-1.10-tru

[jira] [Updated] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1989: - Fix Version/s: 1.10 > Handling invalid URLs in CommonCrawlDataDumper > ---

[jira] [Work started] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1989 started by Chris A. Mattmann. > Handling invalid URLs in CommonCrawlDataDumper > --

[jira] [Assigned] (NUTCH-1989) Handling invalid URLs in CommonCrawlDataDumper

2015-04-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-1989: Assignee: Chris A. Mattmann > Handling invalid URLs in CommonCrawlDataDumper >

<    1   2   3   4   5   6   7   8   9   >