My understanding is Nutch is designed to use Hadoop to run in a distributed
fashion across many machines. In order to scale across those machines,
Nutch needs to be able to accept inputs through shared memory that all nodes
in the cluster can read (this often means files on the Hadoop file system)
Well if you want to add URL using the Nutch API then you should trace the
program until you find the point where the directory containing the list of
URL it's used for loading the list of URLs.
On Mon, Jun 23, 2008 at 5:27 AM, yogesh somvanshi <[EMAIL PROTECTED]>
wrote:
> Hello all
>
> i m worrki
Hello all
i m worrking on Nutch.
When u use standered crawl command like :bin/nutch crawl urls -dir crawl
-depth 3 -topN 50
crawling do well but i want to remove need of that Url folder
i want to change or replace urls folder with some Array or map ,but when i
try to du some change to
code then