[ 
https://issues.apache.org/jira/browse/NUTCH-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved NUTCH-659.
------------------------------------

    Resolution: Invalid

Please ask questions on the mailing list.

> Help! No urls fetched for internal repository website
> -----------------------------------------------------
>
>                 Key: NUTCH-659
>                 URL: https://issues.apache.org/jira/browse/NUTCH-659
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.9.0
>         Environment: nutch 0.9, TOMCAT6.0.18, JAVA 1.6.0_10, CentOS 5.2
>            Reporter: Bryan
>            Priority: Critical
>
> I am new to Nutch, and implemented Nutch for my internal company websites 
> search. The version is nutch-2008-11-02_04-01-26.tar.
>  
> My internal company websites includes several HTTP websites. 
> Another one is SVN repository HTTPS websites in XML structure, using <dir> 
> and <file> tag.
>  
> The search in HTTP websites is good. 
> The HTTPS is ok. We have some links in those HTTP websites which point to 
> Word files under SVN website. They can be indexed.
>  
> But the Nutch does not search my SVN website. If I only search the SVN 
> website, it is always: 0 urls fetched.
>  
> My nutch-site.xml is as following:
> <property>
>   <name>plugin.includes</name>
>   
> <value>protocol-httpclient|urlfilter-regex|parse-(text|html|js|msexcel|msword|mspowerpoint|pdf|zip|swf|rss)|index-(basic|anchor)|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
>  
> # skip file:, ftp:, & mailto: urls
> -^(ftp|mailto):
>  
> # accept hosts in MY.DOMAIN.NAME
> +^http://([a-z0-9]*\.)*smartlabs.com.au/
>  
> Any help would be much appreciated. Thanks in advnce.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to