Re: Following tags

2006-05-18 Thread Andrzej Bialecki
Chris Schneider wrote: Gang, I had a webmaster complain that our crawler was following his links. Although he admits that his use of the GET method is a bit unorthodox, he feels strongly that form submissions with input fields shouldn't be followed by crawlers. Would it make sense to modify

Fetcher.java reporting incorrect kb/s?

2006-05-18 Thread Greg Kim
Hi, I was just looking at the Fetcher.java code on trunk (r 407599), snippet below. The total # of bytes is getting multiplied by 8 and the division by 8.0 is missing; private void reportStatus() throws IOException { String status; synchronized (this) { long elapsed = (System.current

Nutch 'Help Wanted' page on wiki

2006-05-18 Thread Gordon Mohr
To complement the existing 'Support' (experts available) page and at Doug's suggestion, I've added a 'Help Wanted' page to the Nutch wiki: http://wiki.apache.org/nutch/Help_Wanted There's also a first listing to get things started. :) - Gordon @ IA

[jira] Created: (NUTCH-270) Apply just the applicable portions of the patch to protocol.httpclient.Http.java

2006-05-18 Thread Jeremy Calvert (JIRA)
Apply just the applicable portions of the patch to protocol.httpclient.Http.java Key: NUTCH-270 URL: http://issues.apache.org/jira/browse/NUTCH-270 Project: Nutch Type: Sub-task Compo

Re: Fetcher.java reporting incorrect kb/s?

2006-05-18 Thread Ken Krugler
kb/s is kilobits/second, not kilobytes/second. See . I agree that using the more explicit kbits/s would be better. Related micro-nit...least according to http://en.wikipedia.org/wiki/Kilobit_per_second) it should be /1000, not /1024. -- Ken I was just

Re: Fetcher.java reporting incorrect kb/s?

2006-05-18 Thread Andrzej Bialecki
Greg Kim wrote: Hi, I was just looking at the Fetcher.java code on trunk (r 407599), snippet below. The total # of bytes is getting multiplied by 8 and the division by 8.0 is missing; private void reportStatus() throws IOException { String status; synchronized (this) { long elapse

[jira] Created: (NUTCH-271) Meta-data per URL/site/section

2006-05-18 Thread Stefan Neufeind (JIRA)
Meta-data per URL/site/section -- Key: NUTCH-271 URL: http://issues.apache.org/jira/browse/NUTCH-271 Project: Nutch Type: New Feature Versions: 0.7.2 Reporter: Stefan Neufeind We have the need to index sites and attach addit

[jira] Commented: (NUTCH-271) Meta-data per URL/site/section

2006-05-18 Thread Gal Nitzan (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-271?page=comments#action_12412435 ] Gal Nitzan commented on NUTCH-271: -- This functionality is already available in Nutch-0.8 > Meta-data per URL/site/section > -- > > Key: NU

[jira] Commented: (NUTCH-271) Meta-data per URL/site/section

2006-05-18 Thread Gal Nitzan (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-271?page=comments#action_12412436 ] Gal Nitzan commented on NUTCH-271: -- Sorry for the short comment. Actually the meta tags functionality is already available in the 0.8 version along with a CrawlDatum object.