Re: I want to volunteer some time

2012-01-17 Thread Lewis John Mcgibbney
Hi Eddie, I've added you to the AdminGroup for our wiki, you will be able to edit whichever areas you are interested in, or which you think can/should be improved. Your introduction sounds real interesting and as Markus & Julien have said there is a lot of issues which merit some input, its great

[Nutch Wiki] Trivial Update of "AdminGroup" by LewisJohnMcgibbney

2012-01-17 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "AdminGroup" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/AdminGroup?action=diff&rev1=6&rev2=7 * JulienNioche * MarkusJelsma * ElisabethAdler +

[jira] [Commented] (NUTCH-1251) Deletion of duplicates fails with org.apache.solr.client.solrj.SolrServerException

2012-01-17 Thread Arkadi Kosmynin (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188115#comment-13188115 ] Arkadi Kosmynin commented on NUTCH-1251: It is one line change. File org.apache.n

[jira] [Updated] (NUTCH-1251) Deletion of duplicates fails with org.apache.solr.client.solrj.SolrServerException

2012-01-17 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1251: - Fix Version/s: 1.5 > Deletion of duplicates fails with > org.apache.solr.client.solrj.SolrSe

[jira] [Commented] (NUTCH-1251) Deletion of duplicates fails with org.apache.solr.client.solrj.SolrServerException

2012-01-17 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188095#comment-13188095 ] Markus Jelsma commented on NUTCH-1251: -- Can you provide a patch for trunk?

[jira] [Updated] (NUTCH-1251) Deletion of duplicates fails with org.apache.solr.client.solrj.SolrServerException

2012-01-17 Thread Arkadi Kosmynin (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arkadi Kosmynin updated NUTCH-1251: --- Description: Deletion of duplicates fails. This happens because the "get all" query used to

[jira] [Created] (NUTCH-1251) Deletion of duplicates fails with org.apache.solr.client.solrj.SolrServerException

2012-01-17 Thread Arkadi Kosmynin (Created) (JIRA)
Deletion of duplicates fails with org.apache.solr.client.solrj.SolrServerException -- Key: NUTCH-1251 URL: https://issues.apache.org/jira/browse/NUTCH-1251 Project: Nutch

Re: I want to volunteer some time

2012-01-17 Thread Eddie Drapkin
Alrighty! I checked out the JIRA and sort of attacked an issue I think I can contribute to... I'll look and try to find more as well. I can certainly write documentation if that's a need (when isn't it?), just someone point me at the areas that need better documentation and I'll do what I ca

[jira] [Commented] (NUTCH-1201) Allow for different FetcherThread impls

2012-01-17 Thread Edward Drapkin (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187950#comment-13187950 ] Edward Drapkin commented on NUTCH-1201: --- You bring up a good point, and I was making

[jira] [Commented] (NUTCH-1201) Allow for different FetcherThread impls

2012-01-17 Thread Andrzej Bialecki (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187927#comment-13187927 ] Andrzej Bialecki commented on NUTCH-1201: -- I agree that there are situations whe

Re: I want to volunteer some time

2012-01-17 Thread Julien Nioche
Hi Eddie, Great to hear that! Just to add to what Markus said there are also quite a few tasks to do on the NutchGora branch if that's something you'd be interested in. Or outside the tasks on JIRA, there is always a fair bit to do on the Wiki e.g. how to run in distributed mode etc... Just out o

Re: I want to volunteer some time

2012-01-17 Thread Markus Jelsma
Hi! Excellent! You may want to check the list of issues for 1.5. There are several issues being worked on from time to time and a number of open issues and even a few hairy problems. Contribution as patch or comment on any issue is always appreciated. You can also create issues to solve problem

[jira] [Commented] (NUTCH-1201) Allow for different FetcherThread impls

2012-01-17 Thread Edward Drapkin (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187904#comment-13187904 ] Edward Drapkin commented on NUTCH-1201: --- I was thinking more of an approach of break

I want to volunteer some time

2012-01-17 Thread Eddie Drapkin
Hello all, I've got a bunch of spare time coming up in the next several weeks/months and would like to volunteer to help the project out. I'm already extremely familiar with the internals of Nutch, as I've been hacking at it for our internal use here (at Wolfram Research) for the last ~1.5 y

[jira] [Commented] (NUTCH-1201) Allow for different FetcherThread impls

2012-01-17 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187884#comment-13187884 ] Markus Jelsma commented on NUTCH-1201: -- Hi Edward, I've already modified Fetcher to

[jira] [Commented] (NUTCH-1201) Allow for different FetcherThread impls

2012-01-17 Thread Edward Drapkin (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187874#comment-13187874 ] Edward Drapkin commented on NUTCH-1201: --- Does this still need to be done? It seems

[jira] [Updated] (NUTCH-1242) Allow disabling of URL Filters in ParseSegment

2012-01-17 Thread Edward Drapkin (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Drapkin updated NUTCH-1242: -- Attachment: trunk.patch Okay, here's a patch against trunk that does: 1) modifies ParseOutputF

[jira] [Commented] (NUTCH-1242) Allow disabling of URL Filters in ParseSegment

2012-01-17 Thread Edward Drapkin (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187856#comment-13187856 ] Edward Drapkin commented on NUTCH-1242: --- Yeah, I realized I should have added the sa

[jira] [Commented] (NUTCH-1242) Allow disabling of URL Filters in ParseSegment

2012-01-17 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187846#comment-13187846 ] Markus Jelsma commented on NUTCH-1242: -- Thanks Edward. Is it possible for you to prov

[jira] [Assigned] (NUTCH-1242) Allow disabling of URL Filters in ParseSegment

2012-01-17 Thread Markus Jelsma (Assigned) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-1242: Assignee: Markus Jelsma > Allow disabling of URL Filters in ParseSegment >

[jira] [Updated] (NUTCH-1242) Allow disabling of URL Filters in ParseSegment

2012-01-17 Thread Edward Drapkin (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Drapkin updated NUTCH-1242: -- Attachment: ParseSegment.patch Updated patch to add a message to the usage description.

[jira] [Updated] (NUTCH-1242) Allow disabling of URL Filters in ParseSegment

2012-01-17 Thread Edward Drapkin (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Drapkin updated NUTCH-1242: -- Attachment: (was: ParseSegment.patch) > Allow disabling of URL Filters in ParseSegment

[jira] [Updated] (NUTCH-1242) Allow disabling of URL Filters in ParseSegment

2012-01-17 Thread Edward Drapkin (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Drapkin updated NUTCH-1242: -- Attachment: ParseSegment.patch Added a patch to allow a -nofilter parameter passed to ParseSegm

[jira] [Created] (NUTCH-1250) parse-html does not parse links with empty anchor

2012-01-17 Thread Andreas Janning (Created) (JIRA)
parse-html does not parse links with empty anchor - Key: NUTCH-1250 URL: https://issues.apache.org/jira/browse/NUTCH-1250 Project: Nutch Issue Type: Bug Components: parser Affects

[jira] [Commented] (NUTCH-1247) CrawlDatum.retries should be int

2012-01-17 Thread Sebastian Nagel (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187761#comment-13187761 ] Sebastian Nagel commented on NUTCH-1247: A FETCH_RETRY is already set to DB_GONE i

[jira] [Commented] (NUTCH-1247) CrawlDatum.retries should be int

2012-01-17 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187730#comment-13187730 ] Markus Jelsma commented on NUTCH-1247: -- Sebastian, most of these records throw an Unk

[jira] [Updated] (NUTCH-1247) CrawlDatum.retries should be int

2012-01-17 Thread Sebastian Nagel (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1247: --- Attachment: NUTCH-1247.patch_B NUTCH-1247.patch_A > CrawlDatum.retries sh

[jira] [Commented] (NUTCH-1247) CrawlDatum.retries should be int

2012-01-17 Thread Sebastian Nagel (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187716#comment-13187716 ] Sebastian Nagel commented on NUTCH-1247: Interestingly, I also found a couple of U