Hi thanks for the reply. How does nutch/solr handle the scenario where 1 website calls price, "price" and another website calls it "cost". Same thing different name, yet I would want the facet to handle that and not create a different facet.
Is this combo of nutch and Solr that intelligent and or intuitive? Thanks for the fast response. On Sep 12, 2011 9:06 AM, "Erick Erickson [via Lucene]" < ml-node+s472066n3328340...@n3.nabble.com> wrote: > > > Nope, there's nothing in Solr that crawls anything, you have to feed > documents in yourself from the websites. > > Or, look at the Nutch project, see: http://nutch.apache.org/about.html > > which is designed for this kind of problem. > > Best > Erick > > On Sun, Sep 11, 2011 at 8:53 PM, dpt9876 <daninthetrop...@gmail.com> wrote: >> Hi all, >> I am wondering if Solr will do the following for a project I am working on. >> I want to create a search engine with facets for potentially hundreds of >> websites. >> Similar to say crawling amazon + buy.com + ebay and someone can search these >> 3 sites from my 1 website. >> (I realise there are better ways of doing the above example, its for >> illustrative purposes). >> Eventually I would build that search crawl to index say 200 or 1000 >> merchants. >> Someone would come to my site and search for "digital camera". >> >> They would get results from all 3 indexes and hopefully dynamic facets eg >> Price $100-200 >> Price 200-300 >> Resolution 1mp-2mp >> >> etc etc >> >> Can this be done on the fly? >> >> I ask this because I am currently developing webscrapers to crawl these >> websites, dump that data into a db, then was thinking of tacking on a solr >> server to crawl my db. >> >> Problem with that approach is that crawling the worlds ecommerce sites will >> take forever, when it seems solr might do that for me? (I have read about >> multiple indexes etc). >> >> Many thanks >> >> -- >> View this message in context: http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > > _______________________________________________ > If you reply to this email, your message will be added to the discussion below: > http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328340.html > > To unsubscribe from Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?, visit http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3328314&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI4MzE0fC04MDk0NTc1ODg= -- View this message in context: http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328449.html Sent from the Solr - User mailing list archive at Nabble.com.