Re: how to search pdf and word

2008-07-07 Thread 宫照
Thank you Kevinchen for your tips, I already can parsing pdf and word now. but in the search result when I click cached, the page will give a result like this: The cached content has mime type "application/pdf", click this link<./servlet/cached?idx=0&id=55>to download it directly. I want the res

Re: how to search pdf and word

2008-07-07 Thread kevin chen
You need to turn on two plugins, parse-pdf and parse-msword.; Look at your ${NUTCH_HOME}/conf/nutch-site.xml, change property "plugin.include"s: for example: plugin.includes protocol-(httpclient|file)|urlfilter-(regex)|parse-(text| html|js|pdf|msword)|index-(basic)|query- (basic|

Re: Indexing static html files

2008-07-07 Thread 宫照
hi everybody, I setup nuthc-0.9, and I can search txt and html in local system . Now i want to search pdf and msword , can you tell me how to do? BR, mingkong

how to search pdf and word

2008-07-07 Thread 宫照
hi everybody, I setup nuthc-0.9, and I can search txt and html in local system . Now i want to search pdf and msword , can you tell me how to do? BR, mingkong

Re: Indexing static html files

2008-07-07 Thread Winton Davies
I meant that you could just do a http://external_url.com/y/z/ crawl . But yes, if you have pages from someone elses server locally, you will need to rewrite the BASE component of the URL in the search results. For that you could probably just hack search.jsp (but dont tell anyone I told you

Help to get the entire link in the anchor field instead of the anchor to a fetched page.

2008-07-07 Thread Ismael
Hello. I need to get the links followed by nutch to reach a page; something like the anchors, but getting all the information inside the link instead of the text of the link. I don't know if this can be done building a plugin, or if I must modify the Nutch code to get this information. I went thro