[
https://issues.apache.org/jira/browse/NUTCH-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539162
]
Dawid Weiss commented on NUTCH-567:
---
I agree. What we used to do in Carrot2 was to include the patch (against the
o
[
https://issues.apache.org/jira/browse/NUTCH-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539146
]
Doug Cook commented on NUTCH-566:
-
Hi Doğacan.
Thanks for following up. The issue has gotten a little more complicat
Thanks for the information. I'll have to run a fresh fetch to get
some correct stats so I'll submit it in a day or two.
On 10/31/07, Doğacan Güney <[EMAIL PROTECTED]> wrote:
> Hi,
>
> On 10/31/07, Ned Rockson <[EMAIL PROTECTED]> wrote:
> > I submitted a JIRA ticket regarding URL ordering in Gener
Hi,
On 10/31/07, Ned Rockson <[EMAIL PROTECTED]> wrote:
> I submitted a JIRA ticket regarding URL ordering in Generator.java as
> well as a patch (NUTCH-570) and I'm wondering what else I need to do to
> get this committed. Obviously it's low priority so I may be getting too
> antsy.
>
Since NUT
[
https://issues.apache.org/jira/browse/NUTCH-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539135
]
Andrzej Bialecki commented on NUTCH-567:
-
I'm slightly worried about losing track of what has been patched in
I submitted a JIRA ticket regarding URL ordering in Generator.java as
well as a patch (NUTCH-570) and I'm wondering what else I need to do to
get this committed. Obviously it's low priority so I may be getting too
antsy.
[
https://issues.apache.org/jira/browse/NUTCH-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539133
]
Doğacan Güney commented on NUTCH-548:
-
I think this is ready for commit, but I would like to get an approval from
[
https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney reassigned NUTCH-559:
---
Assignee: Doğacan Güney
> NTLM, Basic and Digest Authentication schemes for web/proxy server
>
[
https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539131
]
Doğacan Güney commented on NUTCH-559:
-
Hi Susam,
Your last patch looks great!
I have one minor nit: I think it w
[
https://issues.apache.org/jira/browse/NUTCH-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539127
]
Doğacan Güney commented on NUTCH-566:
-
I am going to commit this one, but I am not sure what needs to be updated
[
https://issues.apache.org/jira/browse/NUTCH-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539123
]
Doğacan Güney commented on NUTCH-567:
-
Hi Dawid,
If tagsoup is not going to release a new version soon, then IMHO
Hi,
On 31/10/2007, zhao xiuwen <[EMAIL PROTECTED]> wrote:
>
> Should I implement HtmlParseFilter?
Yes
If it is,How to invoke my method in
> filter() of HtmlParseFilter?
Load your plugin in the nutch config and filter() will be called for every
html file that you crawl.
Best,
Adam
[
https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-552:
---
Attachment: NUTCH-552-3.patch
New patch. Fixes problems with path handling changes in hadoop affectin
Should I implement HtmlParseFilter? If it is,How to invoke my method in
filter() of HtmlParseFilter?
Thanks.
2007/10/31, zhao xiuwen <[EMAIL PROTECTED]>:
>
> Hi,
> I have seen the http://wiki.apache.org/nutch/WritingPluginExample, but
> I don't understand clearly.
> I need extract spec
Hi,
I have seen the http://wiki.apache.org/nutch/WritingPluginExample, but I
don't understand clearly.
I need extract specified infromation in specified web site in nucth.
Firstly,I determine a URL set.
Secondly,I determine that the current page URL was contained the URL set.
Lastl
[
https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539028
]
Andrzej Bialecki commented on NUTCH-552:
-
We definitely need to do this, things would crash & burn otherwise.
[
https://issues.apache.org/jira/browse/NUTCH-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539025
]
Dawid Weiss commented on NUTCH-567:
---
Hi Doğacan. I have sent an e-mail to Tagsoup's mailing list, but it seems like
17 matches
Mail list logo