[jira] [Commented] (NUTCH-2357) Index metadata throw Exception because writable object cannot be cast to Text

2017-03-14 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925236#comment-15925236 ] Chris A. Mattmann commented on NUTCH-2357: -- Thanks [~eyeris] and [~wastl-nagel]! > Index

[jira] [Resolved] (NUTCH-2357) Index metadata throw Exception because writable object cannot be cast to Text

2017-03-14 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2357. -- Resolution: Fixed Solved by [~wastl-nagel] in

[jira] [Assigned] (NUTCH-2357) Index metadata throw Exception because writable object cannot be cast to Text

2017-03-14 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2357: Assignee: Chris A. Mattmann > Index metadata throw Exception because writable

[jira] [Work started] (NUTCH-2357) Index metadata throw Exception because writable object cannot be cast to Text

2017-03-14 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2357 started by Chris A. Mattmann. > Index metadata throw Exception because writable object cannot be cast

[jira] [Commented] (NUTCH-2364) http.agent.rotate: IllegalArgumentException / last element of agent names ignored

2017-03-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898161#comment-15898161 ] Chris A. Mattmann commented on NUTCH-2364: -- thanks Seb appreciate it > http.agent.rotate:

[jira] [Resolved] (NUTCH-2171) Upgrade Nutch Trunk to Java 1.8

2017-02-22 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2171. -- Resolution: Fixed Fix Version/s: 1.13 thanks @kamaci merged into master in

[jira] [Assigned] (NUTCH-2171) Upgrade Nutch Trunk to Java 1.8

2017-02-22 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2171: Assignee: Chris A. Mattmann > Upgrade Nutch Trunk to Java 1.8 >

[jira] [Commented] (NUTCH-2333) Indexer for RabbitMQ

2016-11-03 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635112#comment-15635112 ] Chris A. Mattmann commented on NUTCH-2333: -- Even more so I would recommend that [~roannel] and

[jira] [Commented] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2016-08-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404363#comment-15404363 ] Chris A. Mattmann commented on NUTCH-2132: -- Sujen what comes back - is success or an exception

[jira] [Commented] (NUTCH-1371) Replace Ivy with Maven Ant tasks

2016-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359700#comment-15359700 ] Chris A. Mattmann commented on NUTCH-1371: -- Sounds fantastic! CC [~ndouba] > Replace Ivy with

[jira] [Resolved] (NUTCH-2248) CSS parser plugin

2016-05-16 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2248. -- Resolution: Fixed Thanks [~naegelejd] and [~lewismc] for the work! {noformat}

[jira] [Updated] (NUTCH-2248) CSS parser plugin

2016-05-16 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2248: - Fix Version/s: 1.12 > CSS parser plugin > - > > Key:

[jira] [Updated] (NUTCH-2248) CSS parser plugin

2016-05-16 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2248: - Affects Version/s: (was: 1.12) > CSS parser plugin > - > >

[jira] [Work started] (NUTCH-2248) CSS parser plugin

2016-05-16 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2248 started by Chris A. Mattmann. > CSS parser plugin > - > > Key:

[jira] [Assigned] (NUTCH-2248) CSS parser plugin

2016-05-16 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2248: Assignee: Chris A. Mattmann > CSS parser plugin > - > >

[jira] [Work started] (NUTCH-2252) Allow phantomjs as a browser for selenium options

2016-05-07 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2252 started by Chris A. Mattmann. > Allow phantomjs as a browser for selenium options >

[jira] [Assigned] (NUTCH-2252) Allow phantomjs as a browser for selenium options

2016-05-07 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2252: Assignee: Chris A. Mattmann (was: Lewis John McGibbney) > Allow phantomjs as a

[jira] [Updated] (NUTCH-2250) CommonCrawlDumper : Invalid format + skipped parts

2016-04-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2250: - Affects Version/s: (was: 1.12) 1.10 > CommonCrawlDumper :

[jira] [Resolved] (NUTCH-2250) CommonCrawlDumper : Invalid format + skipped parts

2016-04-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2250. -- Resolution: Fixed Fix Version/s: (was: 1.10) 1.12 -

[jira] [Work started] (NUTCH-2250) CommonCrawlDumper : Invalid format + skipped parts

2016-04-14 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2250 started by Chris A. Mattmann. > CommonCrawlDumper : Invalid format + skipped parts >

[jira] [Updated] (NUTCH-2191) Add protocol-htmlunit

2016-04-08 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2191: - Labels: memex (was: ) > Add protocol-htmlunit > - > >

[jira] [Commented] (NUTCH-2191) Add protocol-htmlunit

2016-03-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212425#comment-15212425 ] Chris A. Mattmann commented on NUTCH-2191: -- approved, please update the PR [~karanjeets] and I

[jira] [Resolved] (NUTCH-2241) Unstable Selenium plugin in Nutch. Fixed bugs and enhanced configuration

2016-03-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2241. -- Resolution: Fixed Merged, thanks [~karanjeets]! {noformat} [chipotle:~/tmp/nutch1.12]

[jira] [Work started] (NUTCH-2241) Unstable Selenium plugin in Nutch. Fixed bugs and enhanced configuration

2016-03-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2241 started by Chris A. Mattmann. > Unstable Selenium plugin in Nutch. Fixed bugs and enhanced

[jira] [Updated] (NUTCH-2241) Unstable Selenium plugin in Nutch. Fixed bugs and enhanced configuration

2016-03-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2241: - Labels: firefox interactiveselenium lib-selenium memex nutch nutch-default.xml plugin

[jira] [Assigned] (NUTCH-2241) Unstable Selenium plugin in Nutch. Fixed bugs and enhanced configuration

2016-03-19 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2241: Assignee: Chris A. Mattmann > Unstable Selenium plugin in Nutch. Fixed bugs and

[jira] [Commented] (NUTCH-2191) Add protocol-htmlunit

2016-03-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196088#comment-15196088 ] Chris A. Mattmann commented on NUTCH-2191: -- thanks [~karanjeets] and [~markus17] > Add

[jira] [Work started] (NUTCH-2191) Add protocol-htmlunit

2016-03-14 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2191 started by Chris A. Mattmann. > Add protocol-htmlunit > - > > Key:

[jira] [Commented] (NUTCH-2191) Add protocol-htmlunit

2016-03-14 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192834#comment-15192834 ] Chris A. Mattmann commented on NUTCH-2191: -- thanks [~karanjeets] I'll take a look tomorrow. I

[jira] [Assigned] (NUTCH-2191) Add protocol-htmlunit

2016-03-14 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2191: Assignee: Chris A. Mattmann (was: Markus Jelsma) > Add protocol-htmlunit >

[jira] [Work started] (NUTCH-2239) Selenium Handlers for Ajax Patterns from Student submissions

2016-03-14 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2239 started by Chris A. Mattmann. > Selenium Handlers for Ajax Patterns from Student submissions >

[jira] [Updated] (NUTCH-2239) Selenium Handlers for Ajax Patterns from Student submissions

2016-03-14 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2239: - Labels: memex (was: ) > Selenium Handlers for Ajax Patterns from Student submissions >

[jira] [Updated] (NUTCH-2239) Selenium Handlers for Ajax Patterns from Student submissions

2016-03-14 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2239: - Component/s: protocol fetcher > Selenium Handlers for Ajax Patterns from

[jira] [Assigned] (NUTCH-2239) Selenium Handlers for Ajax Patterns from Student submissions

2016-03-14 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2239: Assignee: Chris A. Mattmann > Selenium Handlers for Ajax Patterns from Student

[jira] [Updated] (NUTCH-2239) Selenium Handlers for Ajax Patterns from Student submissions

2016-03-14 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2239: - Fix Version/s: 1.12 > Selenium Handlers for Ajax Patterns from Student submissions >

[jira] [Commented] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2016-03-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192488#comment-15192488 ] Chris A. Mattmann commented on NUTCH-2132: -- Example of this in action in MEMEX-Explorer:

[jira] [Commented] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2016-03-08 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186086#comment-15186086 ] Chris A. Mattmann commented on NUTCH-2132: -- agreed - I will try and generalize it and then update

[jira] [Commented] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2016-03-07 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184364#comment-15184364 ] Chris A. Mattmann commented on NUTCH-2132: -- [~sujenshah] can we get this committed? This is a

[jira] [Work started] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2016-03-07 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2132 started by Chris A. Mattmann. > Publisher/Subscriber model for Nutch to emit events >

[jira] [Assigned] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2016-03-07 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2132: Assignee: Chris A. Mattmann > Publisher/Subscriber model for Nutch to emit events

[jira] [Resolved] (NUTCH-2213) CommonCrawlDataDumper saves gzipped body in extracted form

2016-02-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2213. -- Resolution: Fixed Fix Version/s: 1.12 > CommonCrawlDataDumper saves gzipped body

[jira] [Commented] (NUTCH-2213) CommonCrawlDataDumper saves gzipped body in extracted form

2016-02-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173188#comment-15173188 ] Chris A. Mattmann commented on NUTCH-2213: -- Fixed thanks [~jnioche]! {noformat}

[jira] [Work started] (NUTCH-2213) CommonCrawlDataDumper saves gzipped body in extracted form

2016-02-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2213 started by Chris A. Mattmann. > CommonCrawlDataDumper saves gzipped body in extracted form >

[jira] [Resolved] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-28 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2144. -- Resolution: Fixed OK all fixed thanks [~thammegowda]! {noformat}

[jira] [Commented] (NUTCH-2191) Add protocol-htmlunit

2016-02-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152420#comment-15152420 ] Chris A. Mattmann commented on NUTCH-2191: -- Markus, we don't need to fix the plugin dependency

[jira] [Commented] (NUTCH-2046) The crawl script should be able to skip an initial injection.

2016-02-11 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142858#comment-15142858 ] Chris A. Mattmann commented on NUTCH-2046: -- +1 from me > The crawl script should be able to skip

[jira] [Commented] (NUTCH-2213) CommonCrawlDataDumper saves gzipped body in extracted form

2016-02-10 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140950#comment-15140950 ] Chris A. Mattmann commented on NUTCH-2213: -- Hi [~jrsr] thanks for the issue request. You may also

[jira] [Assigned] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-10 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2144: Assignee: Chris A. Mattmann > Plugin to override db.ignore.external to exempt

[jira] [Work started] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-10 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2144 started by Chris A. Mattmann. > Plugin to override db.ignore.external to exempt interesting external

[jira] [Commented] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-10 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141172#comment-15141172 ] Chris A. Mattmann commented on NUTCH-2144: -- I am +1 for this patch, and enabled only by the user

[jira] [Commented] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-10 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141293#comment-15141293 ] Chris A. Mattmann commented on NUTCH-2144: -- Agreed and agreed. Thamme can you submit a new

[jira] [Commented] (NUTCH-1314) Impose a limit on the length of outlink target urls

2016-02-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128368#comment-15128368 ] Chris A. Mattmann commented on NUTCH-1314: -- Otis, your patches are always welcome! :) > Impose a

[jira] [Commented] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-26 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118689#comment-15118689 ] Chris A. Mattmann commented on NUTCH-2206: -- +1 please commit > Provide example

[jira] [Commented] (NUTCH-2191) Add protocol-htmlunit

2016-01-08 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089545#comment-15089545 ] Chris A. Mattmann commented on NUTCH-2191: -- Markus thanks! Check out:

[jira] [Commented] (NUTCH-2191) Add protocol-htmlunit

2016-01-05 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083214#comment-15083214 ] Chris A. Mattmann commented on NUTCH-2191: -- Very nice, Markus! Beat me to implementing this one.

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059016#comment-15059016 ] Chris A. Mattmann commented on NUTCH-2184: -- +1 > Enable IndexingJob to function with no crawldb

[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055417#comment-15055417 ] Chris A. Mattmann commented on NUTCH-2184: -- Nice, bruh > Enable IndexingJob to function with no

[jira] [Commented] (NUTCH-2172) Parsing whitespace not just tabs in contenttype-mapping.txt

2015-12-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035902#comment-15035902 ] Chris A. Mattmann commented on NUTCH-2172: -- bq. This could be an improvement if we assume that

[jira] [Commented] (NUTCH-2162) Nutch Webapp Crawl fails as it tries to index

2015-11-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994029#comment-14994029 ] Chris A. Mattmann commented on NUTCH-2162: -- so I tried this out. It actually works fine as long

[jira] [Created] (NUTCH-2158) Upgrade to Tika 1.11

2015-11-03 Thread Chris A. Mattmann (JIRA)
Chris A. Mattmann created NUTCH-2158: Summary: Upgrade to Tika 1.11 Key: NUTCH-2158 URL: https://issues.apache.org/jira/browse/NUTCH-2158 Project: Nutch Issue Type: Task

[jira] [Commented] (NUTCH-2155) Create a "crawl completeness" utility

2015-11-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984432#comment-14984432 ] Chris A. Mattmann commented on NUTCH-2155: -- Seb, shall we update it not to require current and

[jira] [Commented] (NUTCH-2150) Add ProtocolStatus Utility

2015-11-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984433#comment-14984433 ] Chris A. Mattmann commented on NUTCH-2150: -- Again - the solution here is to remove the need for

[jira] [Commented] (NUTCH-2154) Nutch REST API (DB) suffering NullPointerException

2015-10-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983442#comment-14983442 ] Chris A. Mattmann commented on NUTCH-2154: -- [~sujenshah] I'd like to spin 1.11 RC #2 today. Can

[jira] [Resolved] (NUTCH-2150) Add ProtocolStatus Utility

2015-10-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2150. -- Resolution: Fixed thanks Mike! {noformat} [chipotle:~/tmp/nutch1.11] mattmann% svn

[jira] [Updated] (NUTCH-2146) hashCode on the Outlink class

2015-10-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2146: - Fix Version/s: 1.11 > hashCode on the Outlink class > - > >

[jira] [Resolved] (NUTCH-2155) Create a "crawl completeness" utility

2015-10-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2155. -- Resolution: Fixed Thanks Mike! {noformat} [chipotle:~/tmp/nutch1.11] mattmann% svn

[jira] [Assigned] (NUTCH-2146) hashCode on the Outlink class

2015-10-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2146: Assignee: Chris A. Mattmann > hashCode on the Outlink class >

[jira] [Work started] (NUTCH-2146) hashCode on the Outlink class

2015-10-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2146 started by Chris A. Mattmann. > hashCode on the Outlink class > - > >

[jira] [Commented] (NUTCH-2146) hashCode on the Outlink class

2015-10-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983402#comment-14983402 ] Chris A. Mattmann commented on NUTCH-2146: -- Going to commit this shortly. > hashCode on the

[jira] [Work started] (NUTCH-2150) Add ProtocolStatus Utility

2015-10-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2150 started by Chris A. Mattmann. > Add ProtocolStatus Utility > -- > >

[jira] [Assigned] (NUTCH-2150) Add ProtocolStatus Utility

2015-10-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2150: Assignee: Chris A. Mattmann > Add ProtocolStatus Utility >

[jira] [Updated] (NUTCH-2150) Add ProtocolStatus Utility

2015-10-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2150: - Fix Version/s: (was: 1.12) 1.11 > Add ProtocolStatus Utility >

[jira] [Work started] (NUTCH-2155) Create a "crawl completeness" utility

2015-10-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2155 started by Chris A. Mattmann. > Create a "crawl completeness" utility >

[jira] [Updated] (NUTCH-2155) Create a "crawl completeness" utility

2015-10-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2155: - Fix Version/s: (was: 1.12) 1.11 > Create a "crawl completeness"

[jira] [Assigned] (NUTCH-2155) Create a "crawl completeness" utility

2015-10-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2155: Assignee: Chris A. Mattmann > Create a "crawl completeness" utility >

[jira] [Updated] (NUTCH-2155) Create a "crawl completeness" utility

2015-10-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2155: - Labels: memex (was: ) > Create a "crawl completeness" utility >

[jira] [Resolved] (NUTCH-2154) Nutch REST API (DB) suffering NullPointerException

2015-10-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2154. -- Resolution: Fixed Thanks Sujen! Thanks Aron! {noformat} [chipotle:~/tmp/nutch1.11]

[jira] [Resolved] (NUTCH-1800) Documentation for Nutch 1.X and 2.X REST APIs

2015-10-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-1800. -- Resolution: Fixed this is done, thanks Lewis > Documentation for Nutch 1.X and 2.X

[jira] [Resolved] (NUTCH-2146) hashCode on the Outlink class

2015-10-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2146. -- Resolution: Fixed Thanks [~jorgelbg]! {noformat} [chipotle:~/tmp/nutch1.11] mattmann%

[jira] [Commented] (NUTCH-2153) Nutch REST API (DB) uses POST instead of GET to request

2015-10-28 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978748#comment-14978748 ] Chris A. Mattmann commented on NUTCH-2153: -- can you be more specific here, [~ahmadia]? > Nutch

[jira] [Commented] (NUTCH-2153) Nutch REST API (DB) uses POST instead of GET to request

2015-10-28 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978854#comment-14978854 ] Chris A. Mattmann commented on NUTCH-2153: -- Yeah I think we may want to do something async here

[jira] [Updated] (NUTCH-2154) Nutch REST API (DB) suffering NullPointerException

2015-10-28 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2154: - Fix Version/s: 1.11 > Nutch REST API (DB) suffering NullPointerException >

[jira] [Assigned] (NUTCH-2154) Nutch REST API (DB) suffering NullPointerException

2015-10-28 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2154: Assignee: Chris A. Mattmann > Nutch REST API (DB) suffering NullPointerException >

[jira] [Commented] (NUTCH-2153) Nutch REST API (DB) uses POST instead of GET to request

2015-10-28 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978769#comment-14978769 ] Chris A. Mattmann commented on NUTCH-2153: -- Gotcha, thanks [~ahmadia] > Nutch REST API (DB) uses

[jira] [Commented] (NUTCH-2154) Nutch REST API (DB) suffering NullPointerException

2015-10-28 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978811#comment-14978811 ] Chris A. Mattmann commented on NUTCH-2154: -- I have to respin 1.11 anyways, so I'll take a look at

[jira] [Updated] (NUTCH-2147) LanguagePreferenceScoringFilter for Nutch

2015-10-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2147: - Fix Version/s: (was: 1.11) 1.12 > LanguagePreferenceScoringFilter

[jira] [Commented] (NUTCH-2149) REST endpoint to read Nutch sequence files

2015-10-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973368#comment-14973368 ] Chris A. Mattmann commented on NUTCH-2149: -- in your commit msg for the future [~sujenshah]

[jira] [Updated] (NUTCH-2133) Transfer Selenium Documentation to WIki

2015-10-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2133: - Fix Version/s: (was: 1.11) (was: 2.4) 1.12

[jira] [Updated] (NUTCH-2030) ParseZip plugin is not able to extract language from zip document,this could solve that problem.

2015-10-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2030: - Fix Version/s: (was: 1.11) 1.12 > ParseZip plugin is not able to

[jira] [Updated] (NUTCH-2086) Nutch 1.X Webui

2015-10-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2086: - Fix Version/s: (was: 1.11) 1.12 > Nutch 1.X Webui >

[jira] [Updated] (NUTCH-2135) Ant Eclipse build does not include protocol-interactiveselenium

2015-10-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2135: - Fix Version/s: (was: 1.11) 1.12 > Ant Eclipse build does not

[jira] [Updated] (NUTCH-2132) Publisher/Subscriber model for Nutch to emit events

2015-10-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2132: - Fix Version/s: (was: 1.11) 1.12 > Publisher/Subscriber model for

[jira] [Updated] (NUTCH-2140) Atomic update and optimistic concurrency update using Solr

2015-10-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2140: - Fix Version/s: (was: 1.11) 1.12 > Atomic update and optimistic

[jira] [Updated] (NUTCH-2120) Remove MapWritable from trunk codebase

2015-10-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2120: - Fix Version/s: (was: 1.11) 1.12 > Remove MapWritable from trunk

[jira] [Work started] (NUTCH-2141) Change the InteractiveSelenium plugin handler Interface to return page content

2015-10-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2141 started by Chris A. Mattmann. > Change the InteractiveSelenium plugin handler Interface to return page

[jira] [Updated] (NUTCH-2128) Refactor configuration end point

2015-10-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2128: - Fix Version/s: (was: 1.11) 1.12 > Refactor configuration end point

[jira] [Updated] (NUTCH-1943) Form authentication should not be global and ignore

2015-10-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1943: - Fix Version/s: (was: 1.11) 1.12 > Form authentication should not

[jira] [Updated] (NUTCH-2064) URLNormalizer basic to properly encode non-ASCII characters

2015-10-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2064: - Fix Version/s: (was: 1.11) 1.12 > URLNormalizer basic to properly

[jira] [Resolved] (NUTCH-2141) Change the InteractiveSelenium plugin handler Interface to return page content

2015-10-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2141. -- Resolution: Fixed Fix Version/s: 1.11 Thanks [~BalaJira] [~jo...@apache.org]

[jira] [Assigned] (NUTCH-2142) Nutch File Dump - FileNotFoundException (Invalid Argument) Error

2015-10-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2142: Assignee: Chris A. Mattmann > Nutch File Dump - FileNotFoundException (Invalid

  1   2   3   4   5   6   7   >