Re: Hosts File & Nutch 1.0+

Mark Achee Tue, 19 Apr 2011 17:02:30 -0700

With nslookup already showing the correct IP address, it doesn't seem like a
hostname/DNS issue.  But I assume this is what the developer is talking
about:


At the end of your /etc/hosts file add

127.0.0.1  www.example.org

but replace www.example.org with your domain.  If you know what the server's
other IP address(es) is/are, you could try those also instead of 127.0.0.1.
 If that doesn't fix it, it's probably not really a hostname/DNS issue.



-Mark


On Tue, Apr 19, 2011 at 6:47 PM, Alex <[email protected]> wrote:

> I edited that so that it does not disclose the location of my
> rootUrLDir.  The path is accurate.
>
> I am going to find out what command is given to nutch but basically
> the application developer has confirmed that the issue is the hosts
> file or something on the server that can not search itself.
>
> Alex
> On Apr 19, 2011, at 5:22 PM, Mark Achee wrote:
>
> >> From your logs:
> >
> > INFO sitesearch.CrawlerUtil: rootUrlDir = /path/to/directory/
> >
> >
> > Looks like you didn't set the seed urls directory.  If that's not
> > enough
> > info for you to fix it, send the full command you're running.
> >
> > -Mark
> >
> >
> >
> > On Thu, Apr 14, 2011 at 10:57 PM, Alex <[email protected]>
> > wrote:
> >
> >> Hi,
> >>
> >> I am new to Nutch.  I have an application that uses Nutch to search.
> >> I have configured the application so that Nutch can run.  However,
> >> after a lot of troubleshooting I have been pointed to the fact that
> >> there is something wrong with my hosts file.  My hostname is
> >> different
> >> than my domain name and that "seems" to make Nutch stop in depth 1.
> >> Does anyone have any idea of what is the correct configuration of the
> >> hosts file so that nutch runs properly?
> >>
> >> My domain name resolves fine.  Please help me!
> >>
> >> Here are the logs of the indexing:
> >>
> >> Stopping at depth=1 - no more URLs to fetch.
> >>
> >> INFO sitesearch.CrawlerUtil: indexHost : Starting an Site Search
> >> index on host www.mydomain.com
> >> INFO sitesearch.CrawlerUtil: site search crawl started in: /opt/
> >> dotcms/
> >> dotCMS/assets/search_index/www.mydomain.com/1-XXX_temp/crawl-index
> >> ] INFO sitesearch.CrawlerUtil: rootUrlDir = /path/to/directory/
> >> search_index/www.mydomain.com/url_folder
> >> INFO sitesearch.CrawlerUtil: threads = 10
> >> INFO sitesearch.CrawlerUtil: depth = 20
> >> INFO sitesearch.CrawlerUtil: indexer=lucene
> >>
> >> INFO sitesearch.CrawlerUtil: Stopping at depth=1 - no more URLs to
> >> fetch.
> >> NFO sitesearch.CrawlerUtil: site search crawl finished: /
> >> directorypath/
> >> search_index/www.mydomain.com/1xxx/crawl-index
> >> INFO sitesearch.CrawlerUtil: indexHost : Finished Site Search index
> >> on
> >> host www.mydomain.com
> >>
>
>

Re: Hosts File & Nutch 1.0+

Reply via email to