Ok nice. So its possible. Do you think this is a better method than scraping using an alternate? It seems to me it is in that it will work better with my end state, being Solr faceted search and I can remove layers of complexity. On Sep 12, 2011 8:03 PM, "Markus Jelsma-2 [via Lucene]" < [email protected]> wrote: > > > Yes you can. As Ken replied in your Solr thread you must create custom parse > and indexing filters. The parse filter is needed to extract the information > and store it in the document and the index filter is used to pass that new
> information to the Solr index. > > > On Monday 12 September 2011 12:55:49 dpt9876 wrote: >> Hi, the friendly guys at the Solr user group pointed me here. >> >> I am wondering if Nutch/Solr will do the following for a project I am >> working on. >> I want to create a search engine with facets for potentially hundreds of >> websites. >> Similar to say crawling amazon + buy.com + ebay and someone can search >> these 3 sites from my 1 website. >> (I realise there are better ways of doing the above example, its for >> illustrative purposes). >> Eventually I would build that search crawl to index say 200 or 1000 >> merchants. >> Someone would come to my site and search for "digital camera". >> >> They would get results from all 3 indexes and hopefully dynamic facets eg >> Price $100-200 >> Price 200-300 >> Resolution 1mp-2mp >> >> etc etc >> >> Can this be done on the fly? >> >> I ask this because I am currently developing webscrapers to crawl these >> websites, dump that data into a db, then was thinking of tacking on a solr >> server to crawl my db. >> >> Problem with that approach is that crawling the worlds ecommerce sites will >> take forever, when it seems solr might do that for me? (I have read about >> multiple indexes etc). >> >> Many thanks >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Will-Solr-Nutch-crawl-multi-websites-ak >> a-a-mini-google-with-faceted-search-tp3329346p3329346.html Sent from the >> Nutch - User mailing list archive at Nabble.com. > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 > > > _______________________________________________ > If you reply to this email, your message will be added to the discussion below: > http://lucene.472066.n3.nabble.com/Will-Solr-Nutch-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3329346p3329431.html > > To unsubscribe from Will Solr/Nutch crawl multi websites (aka a mini google with faceted search)?, visit http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3329346&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI5MzQ2fC04MDk0NTc1ODg= -- View this message in context: http://lucene.472066.n3.nabble.com/Will-Solr-Nutch-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3329346p3329454.html Sent from the Nutch - User mailing list archive at Nabble.com.

