date:20080707

Re: how to search pdf and word

2008-07-07 Thread 宫照

Thank you Kevinchen for your tips, I already can parsing pdf and word now. but in the search result when I click cached, the page will give a result like this: The cached content has mime type "application/pdf", click this link<./servlet/cached?idx=0&id=55>to download it directly. I want the res

Re: how to search pdf and word

2008-07-07 Thread kevin chen

You need to turn on two plugins, parse-pdf and parse-msword.; Look at your ${NUTCH_HOME}/conf/nutch-site.xml, change property "plugin.include"s: for example: plugin.includes protocol-(httpclient|file)|urlfilter-(regex)|parse-(text| html|js|pdf|msword)|index-(basic)|query- (basic|

Re: Indexing static html files

2008-07-07 Thread 宫照

hi everybody, I setup nuthc-0.9, and I can search txt and html in local system . Now i want to search pdf and msword , can you tell me how to do? BR, mingkong

how to search pdf and word

2008-07-07 Thread 宫照

hi everybody, I setup nuthc-0.9, and I can search txt and html in local system . Now i want to search pdf and msword , can you tell me how to do? BR, mingkong

Re: Indexing static html files

2008-07-07 Thread Winton Davies

I meant that you could just do a http://external_url.com/y/z/ crawl . But yes, if you have pages from someone elses server locally, you will need to rewrite the BASE component of the URL in the search results. For that you could probably just hack search.jsp (but dont tell anyone I told you

Help to get the entire link in the anchor field instead of the anchor to a fetched page.

2008-07-07 Thread Ismael

Hello. I need to get the links followed by nutch to reach a page; something like the anchors, but getting all the information inside the link instead of the text of the link. I don't know if this can be done building a plugin, or if I must modify the Nutch code to get this information. I went thro

Re: how to search pdf and word

Re: how to search pdf and word

Re: Indexing static html files

how to search pdf and word

Re: Indexing static html files

Help to get the entire link in the anchor field instead of the anchor to a fetched page.

6 matches

Site Navigation

Mail list logo

Footer information