[jira] [Assigned] (NUTCH-2066) Allow user to specify crawldb and segment db in the Generate JOb REST endpoint

2015-08-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2066: Assignee: Chris A. Mattmann > Allow user to specify crawldb and segment db in the G

[jira] [Comment Edited] (NUTCH-2072) Deflate encoding support is broken when http.content.limit is set to -1

2015-08-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651279#comment-14651279 ] Chris A. Mattmann edited comment on NUTCH-2072 at 8/2/15 11:39 PM: -

[jira] [Resolved] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-08-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2062. -- Resolution: Fixed Committed, thanks Mike! > Add Plugin for interacting with Selenium We

[jira] [Resolved] (NUTCH-2072) Deflate encoding support is broken when http.content.limit is set to -1

2015-08-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2072. -- Resolution: Fixed Fixed, thanks [~ltanguy] {noformat} [chipotle:~/tmp/nutch-trunk] matt

[jira] [Commented] (NUTCH-2072) Deflate encoding support is broken when http.content.limit is set to -1

2015-08-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651277#comment-14651277 ] Chris A. Mattmann commented on NUTCH-2072: -- Tests pass: {noformat} copy-generat

[jira] [Work started] (NUTCH-2072) Deflate encoding support is broken when http.content.limit is set to -1

2015-08-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2072 started by Chris A. Mattmann. > Deflate encoding support is broken when http.content.limit is set to -1

[jira] [Updated] (NUTCH-2072) Deflate encoding support is broken when http.content.limit is set to -1

2015-08-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2072: - Fix Version/s: 1.11 > Deflate encoding support is broken when http.content.limit is set to

[jira] [Commented] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-08-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651266#comment-14651266 ] Chris A. Mattmann commented on NUTCH-2062: -- Thanks [~mjoyce]! All committed: {no

[jira] [Assigned] (NUTCH-2072) Deflate encoding support is broken when http.content.limit is set to -1

2015-08-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2072: Assignee: Chris A. Mattmann > Deflate encoding support is broken when http.content.

[jira] [Commented] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-08-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651264#comment-14651264 ] Chris A. Mattmann commented on NUTCH-2062: -- {noformat} test: [echo] Testing

[jira] [Commented] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-08-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651127#comment-14651127 ] Chris A. Mattmann commented on NUTCH-2059: -- ping thoughts here? Doesn't seem to b

[jira] [Commented] (NUTCH-1785) Ability to index raw content

2015-07-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648294#comment-14648294 ] Chris A. Mattmann commented on NUTCH-1785: -- +1 to commit from me. > Ability to i

[jira] [Commented] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-07-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646447#comment-14646447 ] Chris A. Mattmann commented on NUTCH-2062: -- Mike see: https://github.com/apache/n

[jira] [Work started] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-07-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2062 started by Chris A. Mattmann. > Add Plugin for interacting with Selenium WebDriver > --

[jira] [Assigned] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-07-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2062: Assignee: Chris A. Mattmann (was: Michael Joyce) > Add Plugin for interacting with

[jira] [Commented] (NUTCH-2021) Use protocol-selenium to Capture Screenshots of the Page as it is Fetched

2015-07-22 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14637586#comment-14637586 ] Chris A. Mattmann commented on NUTCH-2021: -- +1 great work Lewis. > Use protocol-

[jira] [Commented] (NUTCH-2062) Add Plugin for interacting with Selenium WebDriver

2015-07-22 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14637584#comment-14637584 ] Chris A. Mattmann commented on NUTCH-2062: -- +1 from me. Commit! > Add Plugin for

[jira] [Commented] (NUTCH-2058) Indexer plugin that allows RegEx replacements on the NutchDocument field values

2015-07-16 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629693#comment-14629693 ] Chris A. Mattmann commented on NUTCH-2058: -- Try now: https://wiki.apache.org/nutc

[jira] [Commented] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-07-16 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629691#comment-14629691 ] Chris A. Mattmann commented on NUTCH-2059: -- +1, sounds good. > protocol-httpclie

[jira] [Commented] (NUTCH-2058) Indexer plugin that allows RegEx replacements on the NutchDocument field values

2015-07-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629120#comment-14629120 ] Chris A. Mattmann commented on NUTCH-2058: -- what's your wiki username? > Indexer

[jira] [Resolved] (NUTCH-2058) Indexer plugin that allows RegEx replacements on the NutchDocument field values

2015-07-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2058. -- Committed! Thanks Peter! {noformat} [mattmann-0420740:~/tmp/nutch-trunk] mattmann% svn comm

[jira] [Commented] (NUTCH-2058) Indexer plugin that allows RegEx replacements on the NutchDocument field values

2015-07-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629021#comment-14629021 ] Chris A. Mattmann commented on NUTCH-2058: -- OK all tests pass locally this looks

[jira] [Updated] (NUTCH-2058) Indexer plugin that allows RegEx replacements on the NutchDocument field values

2015-07-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2058: - Component/s: parser > Indexer plugin that allows RegEx replacements on the NutchDocument f

[jira] [Updated] (NUTCH-2058) Indexer plugin that allows RegEx replacements on the NutchDocument field values

2015-07-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2058: - Fix Version/s: 1.11 > Indexer plugin that allows RegEx replacements on the NutchDocument f

[jira] [Assigned] (NUTCH-2058) Indexer plugin that allows RegEx replacements on the NutchDocument field values

2015-07-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2058: Assignee: Chris A. Mattmann > Indexer plugin that allows RegEx replacements on the

[jira] [Work started] (NUTCH-2058) Indexer plugin that allows RegEx replacements on the NutchDocument field values

2015-07-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2058 started by Chris A. Mattmann. > Indexer plugin that allows RegEx replacements on the NutchDocument fiel

[jira] [Commented] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-07-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628837#comment-14628837 ] Chris A. Mattmann commented on NUTCH-2059: -- great idea, maybe we can make this mo

[jira] [Commented] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-07-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628429#comment-14628429 ] Chris A. Mattmann commented on NUTCH-2059: -- Peter last few tests aren't showing e

[jira] [Commented] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-07-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625904#comment-14625904 ] Chris A. Mattmann commented on NUTCH-2059: -- OK awesome all tests (core and plugin

[jira] [Commented] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-07-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625879#comment-14625879 ] Chris A. Mattmann commented on NUTCH-2059: -- I refactored the junit output formatt

[jira] [Commented] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-07-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625819#comment-14625819 ] Chris A. Mattmann commented on NUTCH-2059: -- Check it out: OK I enabled the Jenkin

[jira] [Commented] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-07-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625332#comment-14625332 ] Chris A. Mattmann commented on NUTCH-2059: -- hmm I added this can you confirm? >

[jira] [Commented] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-07-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625274#comment-14625274 ] Chris A. Mattmann commented on NUTCH-2059: -- OK I added that parameter to the Jenk

[jira] [Commented] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-07-08 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618785#comment-14618785 ] Chris A. Mattmann commented on NUTCH-2059: -- Hi Peter are you subscribed to the bu

[jira] [Commented] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-07-05 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614277#comment-14614277 ] Chris A. Mattmann commented on NUTCH-2059: -- yeah it would be possible - can you c

[jira] [Resolved] (NUTCH-2052) Enhance index-static to allow configurable delimiters

2015-07-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2052. -- Resolution: Fixed {noformat} SUCCESS: Integrated in Nutch-trunk #3191 (See [https://bui

[jira] [Commented] (NUTCH-2052) Enhance index-static to allow configurable delimiters

2015-07-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614119#comment-14614119 ] Chris A. Mattmann commented on NUTCH-2052: -- Builds are passing locally for me: {

[jira] [Commented] (NUTCH-2052) Enhance index-static to allow configurable delimiters

2015-07-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614118#comment-14614118 ] Chris A. Mattmann commented on NUTCH-2052: -- Committed! Running through tests righ

[jira] [Commented] (NUTCH-2052) Enhance index-static to allow configurable delimiters

2015-07-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614117#comment-14614117 ] Chris A. Mattmann commented on NUTCH-2052: -- no problem, I can add them in. > Enh

[jira] [Updated] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-07-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2059: - Fix Version/s: 1.11 > protocol-httpclient, protocol-http unit test errors on Jenkins > ---

[jira] [Resolved] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-07-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2059. -- Resolution: Fixed Thanks! Committed. {noformat} [chipotle:~/tmp/nutch-trunk] mattmann%

[jira] [Updated] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-07-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2059: - Affects Version/s: (was: 1.11) > protocol-httpclient, protocol-http unit test errors o

[jira] [Work started] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-07-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2059 started by Chris A. Mattmann. > protocol-httpclient, protocol-http unit test errors on Jenkins > --

[jira] [Assigned] (NUTCH-2059) protocol-httpclient, protocol-http unit test errors on Jenkins

2015-07-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2059: Assignee: Chris A. Mattmann > protocol-httpclient, protocol-http unit test errors o

[jira] [Work started] (NUTCH-2052) Enhance index-static to allow configurable delimiters

2015-07-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2052 started by Chris A. Mattmann. > Enhance index-static to allow configurable delimiters > ---

[jira] [Reopened] (NUTCH-2052) Enhance index-static to allow configurable delimiters

2015-07-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reopened NUTCH-2052: -- > Enhance index-static to allow configurable delimiters > --

[jira] [Commented] (NUTCH-2052) Enhance index-static to allow configurable delimiters

2015-07-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613838#comment-14613838 ] Chris A. Mattmann commented on NUTCH-2052: -- [~PeterCiuffetti] oddly, there is a r

[jira] [Resolved] (NUTCH-2052) Enhance index-static to allow configurable delimiters

2015-07-03 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2052. -- Resolution: Fixed Committed, thanks! {noformat} [mattmann-0420740:~/tmp/nutch-trunk] ma

[jira] [Commented] (NUTCH-2052) Enhance index-static to allow configurable delimiters

2015-07-03 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613284#comment-14613284 ] Chris A. Mattmann commented on NUTCH-2052: -- Yep, verified, build successful! {no

[jira] [Commented] (NUTCH-2052) Enhance index-static to allow configurable delimiters

2015-07-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612579#comment-14612579 ] Chris A. Mattmann commented on NUTCH-2052: -- Thanks Peter! I think someone on the

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610201#comment-14610201 ] Chris A. Mattmann commented on NUTCH-2038: -- Hey [~markus.jel...@openindex.io] yea

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-07-01 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610198#comment-14610198 ] Chris A. Mattmann commented on NUTCH-2038: -- hey [~markus.jel...@openindex.io] yea

[jira] [Resolved] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2038. -- Resolution: Fixed Passing now in r1688555. > Naive Bayes classifier based html Parse fi

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609618#comment-14609618 ] Chris A. Mattmann commented on NUTCH-2038: -- it's b/c I didn't include the updates

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609575#comment-14609575 ] Chris A. Mattmann commented on NUTCH-2038: -- so this passed locally for me - wonde

[jira] [Reopened] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reopened NUTCH-2038: -- - sigh, per https://builds.apache.org/job/Nutch-trunk/3184/ there seems to still be issues

[jira] [Work started] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2038 started by Chris A. Mattmann. > Naive Bayes classifier based html Parse filter (for filtering outlinks)

[jira] [Commented] (NUTCH-2052) Enhance index-static to allow configurable delimiters

2015-06-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609546#comment-14609546 ] Chris A. Mattmann commented on NUTCH-2052: -- I see the following failures with thi

[jira] [Assigned] (NUTCH-2052) Enhance index-static to allow configurable delimiters

2015-06-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2052: Assignee: Chris A. Mattmann > Enhance index-static to allow configurable delimiters

[jira] [Reopened] (NUTCH-2052) Enhance index-static to allow configurable delimiters

2015-06-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reopened NUTCH-2052: -- > Enhance index-static to allow configurable delimiters > --

[jira] [Work started] (NUTCH-2052) Enhance index-static to allow configurable delimiters

2015-06-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2052 started by Chris A. Mattmann. > Enhance index-static to allow configurable delimiters > ---

[jira] [Resolved] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2038. -- Resolution: Fixed - fixed in r1688549. Thanks! > Naive Bayes classifier based html Pars

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609538#comment-14609538 ] Chris A. Mattmann commented on NUTCH-2038: -- OK I found a few more issues: 1. Add

[jira] [Commented] (NUTCH-2053) Uncessary dependencies included in ivy.xml (post NUTCH-2038)

2015-06-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609526#comment-14609526 ] Chris A. Mattmann commented on NUTCH-2053: -- I'm working on this Lewis, should hav

[jira] [Work started] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2038 started by Chris A. Mattmann. > Naive Bayes classifier based html Parse filter (for filtering outlinks)

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605643#comment-14605643 ] Chris A. Mattmann commented on NUTCH-2038: -- Ugh, On #2, I guess I missed setting

[jira] [Resolved] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-28 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2038. -- Resolution: Fixed alright [~asitang] all committed! Fixed the ParserFactoryTest error. T

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-28 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605120#comment-14605120 ] Chris A. Mattmann commented on NUTCH-2038: -- Tests fail in TestParserFactory: {no

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601499#comment-14601499 ] Chris A. Mattmann commented on NUTCH-2038: -- OK so it looks like the latest patch

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based url filter

2015-06-24 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599748#comment-14599748 ] Chris A. Mattmann commented on NUTCH-2038: -- yeah you got it Seb, we can do accept

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based url filter

2015-06-23 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598724#comment-14598724 ] Chris A. Mattmann commented on NUTCH-2038: -- Yeah so here's the deal. I think I ca

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based url filter

2015-06-21 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595388#comment-14595388 ] Chris A. Mattmann commented on NUTCH-2038: -- That's what we were working on. My 57

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based url filter

2015-06-21 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595385#comment-14595385 ] Chris A. Mattmann commented on NUTCH-2038: -- Hey Seb: Well, the native URLFilter

[jira] [Resolved] (NUTCH-2039) Relevance based scoring filter

2015-06-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2039. -- Resolution: Fixed Committed, thanks! {noformat} [chipotle:~/tmp/nutch-trunk] mattmann%

[jira] [Commented] (NUTCH-2039) Relevance based scoring filter

2015-06-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592937#comment-14592937 ] Chris A. Mattmann commented on NUTCH-2039: -- OK i fixed both: {noformat} ... job:

[jira] [Commented] (NUTCH-2039) Relevance based scoring filter

2015-06-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592936#comment-14592936 ] Chris A. Mattmann commented on NUTCH-2039: -- Two immediate things: 1. Patch inclu

[jira] [Commented] (NUTCH-2039) Relevance based scoring filter

2015-06-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592933#comment-14592933 ] Chris A. Mattmann commented on NUTCH-2039: -- Lewis and I talked about this offline

[jira] [Assigned] (NUTCH-2039) Relevance based scoring filter

2015-06-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2039: Assignee: Chris A. Mattmann (was: Lewis John McGibbney) > Relevance based scoring

[jira] [Work started] (NUTCH-2039) Relevance based scoring filter

2015-06-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2039 started by Chris A. Mattmann. > Relevance based scoring filter > -- > >

[jira] [Commented] (NUTCH-2039) Relevance based scoring filter

2015-06-18 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14591370#comment-14591370 ] Chris A. Mattmann commented on NUTCH-2039: -- Hey Lewis, have you tried: curl -O h

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based url filter

2015-06-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14590901#comment-14590901 ] Chris A. Mattmann commented on NUTCH-2038: -- Asitang, what is the referenced pull

[jira] [Updated] (NUTCH-2039) Relevance based scoring filter

2015-06-16 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2039: - Assignee: Lewis John McGibbney (was: Sujen Shah) > Relevance based scoring filter > -

[jira] [Commented] (NUTCH-2039) Relevance based scoring filter

2015-06-16 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588553#comment-14588553 ] Chris A. Mattmann commented on NUTCH-2039: -- +1, sounds great. I am also +1 Lewis.

[jira] [Commented] (NUTCH-2038) url filter that uses a model (from a classifier)

2015-06-16 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588544#comment-14588544 ] Chris A. Mattmann commented on NUTCH-2038: -- Hi Lewis, there is no reason this can

[jira] [Commented] (NUTCH-2038) url filter that uses a model (from a classifier)

2015-06-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584670#comment-14584670 ] Chris A. Mattmann commented on NUTCH-2038: -- [~asitang] where are we on this? Can

[jira] [Assigned] (NUTCH-2038) url filter that uses a model (from a classifier)

2015-06-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2038: Assignee: Chris A. Mattmann > url filter that uses a model (from a classifier) > -

[jira] [Updated] (NUTCH-2038) url filter that uses a model (from a classifier)

2015-06-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2038: - Component/s: parser injector fetcher > url filter that u

[jira] [Work started] (NUTCH-2038) url filter that uses a model (from a classifier)

2015-06-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2038 started by Chris A. Mattmann. > url filter that uses a model (from a classifier) > ---

[jira] [Updated] (NUTCH-2038) url filter that uses a model (from a classifier)

2015-06-10 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2038: - Fix Version/s: (was: 1.10) 1.11 > url filter that uses a model (fro

[jira] [Updated] (NUTCH-2038) url filter that uses a model (from a classifier)

2015-06-10 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2038: - Affects Version/s: (was: 1.10) > url filter that uses a model (from a classifier) > -

[jira] [Resolved] (NUTCH-2037) Job endpoint to support Indexing from the REST API

2015-06-08 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2037. -- Resolution: Fixed - committed, thanks Sujen! {noformat} [chipotle:~/tmp/nutch-trunk] ma

[jira] [Work started] (NUTCH-2037) Job endpoint to support Indexing from the REST API

2015-06-07 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2037 started by Chris A. Mattmann. > Job endpoint to support Indexing from the REST API > --

[jira] [Assigned] (NUTCH-2037) Job endpoint to support Indexing from the REST API

2015-06-07 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-2037: Assignee: Chris A. Mattmann > Job endpoint to support Indexing from the REST API >

[jira] [Resolved] (NUTCH-2027) seed list REST endpoint for Nutch 1.10

2015-06-07 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-2027. -- Resolution: Fixed > seed list REST endpoint for Nutch 1.10 > ---

[jira] [Commented] (NUTCH-2027) seed list REST endpoint for Nutch 1.10

2015-06-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573984#comment-14573984 ] Chris A. Mattmann commented on NUTCH-2027: -- Committed, thanks Asitang! {noformat

[jira] [Commented] (NUTCH-2027) seed list REST endpoint for Nutch 1.10

2015-06-04 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573976#comment-14573976 ] Chris A. Mattmann commented on NUTCH-2027: -- Awesome thanks, going to commit now.

[jira] [Commented] (NUTCH-2027) seed list REST endpoint for Nutch 1.10

2015-06-03 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572121#comment-14572121 ] Chris A. Mattmann commented on NUTCH-2027: -- Thanks, I'll check this out now. >

[jira] [Commented] (NUTCH-2027) seed list REST endpoint for Nutch 1.10

2015-06-03 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571899#comment-14571899 ] Chris A. Mattmann commented on NUTCH-2027: -- [~asitang] you forgot to svn or git a

[jira] [Updated] (NUTCH-2027) seed list REST endpoint for Nutch 1.10

2015-06-03 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-2027: - Affects Version/s: (was: 1.10) > seed list REST endpoint for Nutch 1.10 >

[jira] [Work started] (NUTCH-2027) seed list REST endpoint for Nutch 1.10

2015-06-03 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2027 started by Chris A. Mattmann. > seed list REST endpoint for Nutch 1.10 > --

<    1   2   3   4   5   6   7   8   9   >