Nutch 1.x crawl Zip file URLs

A Laxmi Thu, 05 May 2016 19:00:25 -0700

Hi,

(a) Is it possible to crawl URL of a Zip file using Nutch and index in
Solr? (pls see example below)


(b) Also, if a zip file URL has PDF files in them, is it possible to use
Nutch to crawl the Zip file URL and also the PDF file inside the Zip file
URL?


E.g.
*https://www.abc123.xxx/sites/docs/testing.zip
<https://www.abc123.xxx/sites/docs/testing.zip>*
When I unzip above URL - I would have the following:


*def.pdf*

*lmn.pdf*
*reg.pdf*


Please advise.

Thanks!

AL

Nutch 1.x crawl Zip file URLs

Reply via email to