Hi Eddie,
I've added you to the AdminGroup for our wiki, you will be able to edit
whichever areas you are interested in, or which you think can/should be
improved.
Your introduction sounds real interesting and as Markus & Julien have said
there is a lot of issues which merit some input, its great
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change
notification.
The "AdminGroup" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/AdminGroup?action=diff&rev1=6&rev2=7
* JulienNioche
* MarkusJelsma
* ElisabethAdler
+
[
https://issues.apache.org/jira/browse/NUTCH-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188115#comment-13188115
]
Arkadi Kosmynin commented on NUTCH-1251:
It is one line change. File
org.apache.n
[
https://issues.apache.org/jira/browse/NUTCH-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1251:
-
Fix Version/s: 1.5
> Deletion of duplicates fails with
> org.apache.solr.client.solrj.SolrSe
[
https://issues.apache.org/jira/browse/NUTCH-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188095#comment-13188095
]
Markus Jelsma commented on NUTCH-1251:
--
Can you provide a patch for trunk?
[
https://issues.apache.org/jira/browse/NUTCH-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arkadi Kosmynin updated NUTCH-1251:
---
Description:
Deletion of duplicates fails. This happens because the "get all" query used to
Deletion of duplicates fails with
org.apache.solr.client.solrj.SolrServerException
--
Key: NUTCH-1251
URL: https://issues.apache.org/jira/browse/NUTCH-1251
Project: Nutch
Alrighty!
I checked out the JIRA and sort of attacked an issue I think I can
contribute to... I'll look and try to find more as well.
I can certainly write documentation if that's a need (when isn't it?),
just someone point me at the areas that need better documentation and
I'll do what I ca
[
https://issues.apache.org/jira/browse/NUTCH-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187950#comment-13187950
]
Edward Drapkin commented on NUTCH-1201:
---
You bring up a good point, and I was making
[
https://issues.apache.org/jira/browse/NUTCH-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187927#comment-13187927
]
Andrzej Bialecki commented on NUTCH-1201:
--
I agree that there are situations whe
Hi Eddie,
Great to hear that! Just to add to what Markus said there are also quite a
few tasks to do on the NutchGora branch if that's something you'd be
interested in. Or outside the tasks on JIRA, there is always a fair bit to
do on the Wiki e.g. how to run in distributed mode etc...
Just out o
Hi!
Excellent! You may want to check the list of issues for 1.5. There are several
issues being worked on from time to time and a number of open issues and even
a few hairy problems. Contribution as patch or comment on any issue is always
appreciated. You can also create issues to solve problem
[
https://issues.apache.org/jira/browse/NUTCH-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187904#comment-13187904
]
Edward Drapkin commented on NUTCH-1201:
---
I was thinking more of an approach of break
Hello all,
I've got a bunch of spare time coming up in the next several
weeks/months and would like to volunteer to help the project out. I'm
already extremely familiar with the internals of Nutch, as I've been
hacking at it for our internal use here (at Wolfram Research) for the
last ~1.5 y
[
https://issues.apache.org/jira/browse/NUTCH-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187884#comment-13187884
]
Markus Jelsma commented on NUTCH-1201:
--
Hi Edward,
I've already modified Fetcher to
[
https://issues.apache.org/jira/browse/NUTCH-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187874#comment-13187874
]
Edward Drapkin commented on NUTCH-1201:
---
Does this still need to be done? It seems
[
https://issues.apache.org/jira/browse/NUTCH-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Edward Drapkin updated NUTCH-1242:
--
Attachment: trunk.patch
Okay, here's a patch against trunk that does:
1) modifies ParseOutputF
[
https://issues.apache.org/jira/browse/NUTCH-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187856#comment-13187856
]
Edward Drapkin commented on NUTCH-1242:
---
Yeah, I realized I should have added the sa
[
https://issues.apache.org/jira/browse/NUTCH-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187846#comment-13187846
]
Markus Jelsma commented on NUTCH-1242:
--
Thanks Edward. Is it possible for you to prov
[
https://issues.apache.org/jira/browse/NUTCH-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-1242:
Assignee: Markus Jelsma
> Allow disabling of URL Filters in ParseSegment
>
[
https://issues.apache.org/jira/browse/NUTCH-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Edward Drapkin updated NUTCH-1242:
--
Attachment: ParseSegment.patch
Updated patch to add a message to the usage description.
[
https://issues.apache.org/jira/browse/NUTCH-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Edward Drapkin updated NUTCH-1242:
--
Attachment: (was: ParseSegment.patch)
> Allow disabling of URL Filters in ParseSegment
[
https://issues.apache.org/jira/browse/NUTCH-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Edward Drapkin updated NUTCH-1242:
--
Attachment: ParseSegment.patch
Added a patch to allow a -nofilter parameter passed to ParseSegm
parse-html does not parse links with empty anchor
-
Key: NUTCH-1250
URL: https://issues.apache.org/jira/browse/NUTCH-1250
Project: Nutch
Issue Type: Bug
Components: parser
Affects
[
https://issues.apache.org/jira/browse/NUTCH-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187761#comment-13187761
]
Sebastian Nagel commented on NUTCH-1247:
A FETCH_RETRY is already set to DB_GONE i
[
https://issues.apache.org/jira/browse/NUTCH-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187730#comment-13187730
]
Markus Jelsma commented on NUTCH-1247:
--
Sebastian, most of these records throw an Unk
[
https://issues.apache.org/jira/browse/NUTCH-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1247:
---
Attachment: NUTCH-1247.patch_B
NUTCH-1247.patch_A
> CrawlDatum.retries sh
[
https://issues.apache.org/jira/browse/NUTCH-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187716#comment-13187716
]
Sebastian Nagel commented on NUTCH-1247:
Interestingly, I also found a couple of U
28 matches
Mail list logo