[jira] Commented: (NUTCH-567) Proper (?) handling of URIs in TagSoup.

2007-10-31 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539162 ] Dawid Weiss commented on NUTCH-567: --- I agree. What we used to do in Carrot2 was to include the patch (against the o

[jira] Commented: (NUTCH-566) Sun's URL class has bug in creation of relative query URLs

2007-10-31 Thread Doug Cook (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539146 ] Doug Cook commented on NUTCH-566: - Hi Doğacan. Thanks for following up. The issue has gotten a little more complicat

Re: Next move with JIRA ticket

2007-10-31 Thread Ned Rockson
Thanks for the information. I'll have to run a fresh fetch to get some correct stats so I'll submit it in a day or two. On 10/31/07, Doğacan Güney <[EMAIL PROTECTED]> wrote: > Hi, > > On 10/31/07, Ned Rockson <[EMAIL PROTECTED]> wrote: > > I submitted a JIRA ticket regarding URL ordering in Gener

Re: Next move with JIRA ticket

2007-10-31 Thread Doğacan Güney
Hi, On 10/31/07, Ned Rockson <[EMAIL PROTECTED]> wrote: > I submitted a JIRA ticket regarding URL ordering in Generator.java as > well as a patch (NUTCH-570) and I'm wondering what else I need to do to > get this committed. Obviously it's low priority so I may be getting too > antsy. > Since NUT

[jira] Commented: (NUTCH-567) Proper (?) handling of URIs in TagSoup.

2007-10-31 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539135 ] Andrzej Bialecki commented on NUTCH-567: - I'm slightly worried about losing track of what has been patched in

Next move with JIRA ticket

2007-10-31 Thread Ned Rockson
I submitted a JIRA ticket regarding URL ordering in Generator.java as well as a patch (NUTCH-570) and I'm wondering what else I need to do to get this committed. Obviously it's low priority so I may be getting too antsy.

[jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat

2007-10-31 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539133 ] Doğacan Güney commented on NUTCH-548: - I think this is ready for commit, but I would like to get an approval from

[jira] Assigned: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server

2007-10-31 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney reassigned NUTCH-559: --- Assignee: Doğacan Güney > NTLM, Basic and Digest Authentication schemes for web/proxy server >

[jira] Commented: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server

2007-10-31 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539131 ] Doğacan Güney commented on NUTCH-559: - Hi Susam, Your last patch looks great! I have one minor nit: I think it w

[jira] Commented: (NUTCH-566) Sun's URL class has bug in creation of relative query URLs

2007-10-31 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539127 ] Doğacan Güney commented on NUTCH-566: - I am going to commit this one, but I am not sure what needs to be updated

[jira] Commented: (NUTCH-567) Proper (?) handling of URIs in TagSoup.

2007-10-31 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539123 ] Doğacan Güney commented on NUTCH-567: - Hi Dawid, If tagsoup is not going to release a new version soon, then IMHO

Re: How to extract specified information from html?

2007-10-31 Thread Adam Lofts
Hi, On 31/10/2007, zhao xiuwen <[EMAIL PROTECTED]> wrote: > > Should I implement HtmlParseFilter? Yes If it is,How to invoke my method in > filter() of HtmlParseFilter? Load your plugin in the nutch config and filter() will be called for every html file that you crawl. Best, Adam

[jira] Updated: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

2007-10-31 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-552: --- Attachment: NUTCH-552-3.patch New patch. Fixes problems with path handling changes in hadoop affectin

Re: How to extract specified information from html?

2007-10-31 Thread zhao xiuwen
Should I implement HtmlParseFilter? If it is,How to invoke my method in filter() of HtmlParseFilter? Thanks. 2007/10/31, zhao xiuwen <[EMAIL PROTECTED]>: > > Hi, > I have seen the http://wiki.apache.org/nutch/WritingPluginExample, but > I don't understand clearly. > I need extract spec

How to extract specified information from html?

2007-10-31 Thread zhao xiuwen
Hi, I have seen the http://wiki.apache.org/nutch/WritingPluginExample, but I don't understand clearly. I need extract specified infromation in specified web site in nucth. Firstly,I determine a URL set. Secondly,I determine that the current page URL was contained the URL set. Lastl

[jira] Commented: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

2007-10-31 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539028 ] Andrzej Bialecki commented on NUTCH-552: - We definitely need to do this, things would crash & burn otherwise.

[jira] Commented: (NUTCH-567) Proper (?) handling of URIs in TagSoup.

2007-10-31 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539025 ] Dawid Weiss commented on NUTCH-567: --- Hi Doğacan. I have sent an e-mail to Tagsoup's mailing list, but it seems like