Problems when crawl a .nsf site

2011-07-03 Thread 丛云牙之主
Hello, I am using nutch-1.2 has encountered a problem.The site is writtenwith lotus domino, I use the browser to enter, click on the emergence of thoseconnections have not changed the site URL, unlike some sites have a lot of suffixes.Then there is a web site is buptoa.bupt.edu.cn /

Re: Problems when crawl a .nsf site

2011-07-03 Thread Alexander Aristov
Hi If it is a text file then you can simply associate the extension with text parser. But if I understand you right it's a lotus Db file then I suspect you have no other choice than implementing your own parser. I haven't heard of lotus files support in nutch. Best Regards Alexander Aristov

Re: Problems when crawl a .nsf site

2011-07-03 Thread lewis john mcgibbney
Absolutely... There is a short (old) thread here on this topic [1], from what I can see this issue has not been addressed. Therefore it looks like implementing your own parser plugin is what's required. [1] http://www.lucidimagination.com/search/document/a8d53fac1caa578c/nutch_with_nsf_files