Crawling JSPs

2007-01-25 Thread Deepa Devanathan
Hi guys, Just had a quick question - can Nutch 0.7.1 crawl text content in JSPs also ? if so, is it built-in or is there a plugin available for this ? if my site content is entirely in JSP's , does that mean it cant be crawl and hence searched ?? pls lemme know ur thoughts on this .. based on w

nutch crawl on a site that needs authentication

2006-07-30 Thread Deepa Devanathan
hi guys, I have a site i need to crawl but the very first page asks for a username, password. Is there a way I can supply these thru some config file in nutch so I can crawl the underlying content ? if anyone has ideas please let me know .. Alex and Sudhi - thanks for your responses to my prev

Nutch with nsf files

2006-07-26 Thread Deepa Devanathan
hi guys, Can Nutch parse thru Lotus notes databases - .nsf files yet ? my site uses nsf's extensively and I need to crawl the content which includes htmls, jsp,s pdfs etc.. Will the normal crawl work ? if anybody has any ideas, please let me know.. any help is greatly appriciated ! Thanks, Dee

Nutch with Domino web server

2006-07-21 Thread Deepa Devanathan
hi guys, I tried crawling my site which works with a Domino web server talking to a Tomcat - using the crawl command ( with all the config for urls, file-types etc etc) - but the crawl log doesnt show any URLs being fetched. Is there something different I need to do to run a crawl for a site run

Running nutch on a non-port 80 site

2006-04-28 Thread Deepa Devanathan
Hi, I have a setup where a non-Apache server is the one serving up content on a port other than 80 along with a Tomcat for jsp content. I have installed nutch and ran the crawl program. The indexes are not getting created properly - I was unable to see the URLs of the pages being index in the log

filtering search results based on language

2005-10-21 Thread Deepa Devanathan
Hi, I am a newbie to Nutch .. need some help with my search results .. I have a common index for some english as well as french htmls .. I read on the mail archives that 1. by activating the language identifier plugin in nutch-default.xml and 2. adding the advisory attribute lang:fr to the que