[ https://issues.apache.org/jira/browse/NUTCH-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Otis Gospodnetic resolved NUTCH-659. ------------------------------------ Resolution: Invalid Please ask questions on the mailing list. > Help! No urls fetched for internal repository website > ----------------------------------------------------- > > Key: NUTCH-659 > URL: https://issues.apache.org/jira/browse/NUTCH-659 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 0.9.0 > Environment: nutch 0.9, TOMCAT6.0.18, JAVA 1.6.0_10, CentOS 5.2 > Reporter: Bryan > Priority: Critical > > I am new to Nutch, and implemented Nutch for my internal company websites > search. The version is nutch-2008-11-02_04-01-26.tar. > > My internal company websites includes several HTTP websites. > Another one is SVN repository HTTPS websites in XML structure, using <dir> > and <file> tag. > > The search in HTTP websites is good. > The HTTPS is ok. We have some links in those HTTP websites which point to > Word files under SVN website. They can be indexed. > > But the Nutch does not search my SVN website. If I only search the SVN > website, it is always: 0 urls fetched. > > My nutch-site.xml is as following: > <property> > <name>plugin.includes</name> > > <value>protocol-httpclient|urlfilter-regex|parse-(text|html|js|msexcel|msword|mspowerpoint|pdf|zip|swf|rss)|index-(basic|anchor)|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value> > > # skip file:, ftp:, & mailto: urls > -^(ftp|mailto): > > # accept hosts in MY.DOMAIN.NAME > +^http://([a-z0-9]*\.)*smartlabs.com.au/ > > Any help would be much appreciated. Thanks in advnce. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.