Re: Fetching inefficiency

2008-04-21 Thread ogjunk-nutch
Adding some comments to the email below, but here on nutch-dev. Basically, it is my feeling that whenever fetchlists (and its parts) are not "well balanced", this inefficiency will be seen. Concretely, whichever task is "stuck fetching from the slow server with a lot of its pages in the fetchlis

Re: Fetching inefficiency

2008-04-21 Thread Ken Krugler
Adding some comments to the email below, but here on nutch-dev. Basically, it is my feeling that whenever fetchlists (and its parts) are not "well balanced", this inefficiency will be seen. Concretely, whichever task is "stuck fetching from the slow server with a lot of its pages in the fetchli

Re: [jira] Commented: (NUTCH-628) Host database to keep track of host-level information

2008-04-21 Thread ogjunk-nutch
Thanks Andrzej. So the disconnect was only measuring (download speed in my mind) per-URL vs. per-host In that case, I think we are talking about a small change (to Fetcher2) that might look like this: + // time the request + long fetchStart = System.currentTimeMillis(