Hello,

Currently I am working on web crawling and indexing , i am using nutch
2.2.1, elastic search for indexing and cassandra datastore.

I am successfully crawling and indexing web pages, but images and some
other file format not crawls and indexed,

I need to index images in seperate form of elastic search.

Parser only parses web page text content, title etc.

I have made change in suffix-urlfilter.txt, regex-urlfilter.txt for
allowing images but it could not parse the image content.

My requirement is i need to crawl images in seperate field of parse table.

Appreciate any help.

Thank you.


Regards,
Jaydip Lakhatariya

This message contains confidential information and is intended only for 
[email protected]. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately if you have received this e-mail by mistake and delete this e-mail 
from your system. Finally, the recipient should check this email and any 
attachments for the presence of viruses. The company accepts no liability for 
any damage caused by any virus transmitted by this email. Sat, 18 Jan 2014 
14:14:52 +0530

Aspire Software Solutions 10/A Dalal, New Vikasgruh Road, Paldi, Ahmedabad, 
India.

Reply via email to