Thanks for your reply.. I was kind of afraid someone was going to say that :-( I have invested so much time into developing plugins for Nutch that I am deathly afraid of moving on to something else.
To answer your questions: 1) What kind of documents/repositories are you trying to provide search for? I have several internal websites I am crawling (most of which are web front for database info), I am also crawling a local shared file system. The document types run the gamut.. html, pdf, word, excel, powerpoint, txt, images, etc, etc. (and any other crap the users throw on the file system) 2) Are security and user access/permissions important for you? Somewhat.. but not as much as you would think. I actually have/had more problems with accessing sites that required SSL certificates. But, I fixed that by modifying the protocol-httpclient to use a java keystore and a client cert to pass while fetching the page. 3) What is the typical size of the document universe you which your software to handle (in number of documents + avg size and/or total GB)? The documents are all under 200mb or so. Most of them are html or pdf files that are of a normal size. The total size of the documents to be crawled is fairly large about 500gb. The other stuff, maybe about 100gb total. -- View this message in context: http://lucene.472066.n3.nabble.com/Going-Beyond-the-Prototype-tp2923289p2923807.html Sent from the Nutch - User mailing list archive at Nabble.com.

