Thankyou for the clarification and help guys I will try them. On Sep 12, 2011 10:29 AM, "kkrugler [via Lucene]" < ml-node+s472066n332847...@n3.nabble.com> wrote: > > > > On Sep 11, 2011, at 7:04pm, dpt9876 wrote: > >> Hi thanks for the reply. >> >> How does nutch/solr handle the scenario where 1 website calls price, "price" >> and another website calls it "cost". Same thing different name, yet I would >> want the facet to handle that and not create a different facet. >> >> Is this combo of nutch and Solr that intelligent and or intuitive? > > What you're describing here is web mining, not web crawling. > > You want to extract price data from web pages, and put that into a specific field in Solr. > > To do that using Nutch, you'd need to write custom plug-ins that know how to extract the price from a page, and add that as a custom field to the crawl results. > > The above is a topic for the Nutch mailing list, since Solr is just a downstream consumer of whatever Nutch provides. > > -- Ken > >> On Sep 12, 2011 9:06 AM, "Erick Erickson [via Lucene]" < >> ml-node+s472066n3328340...@n3.nabble.com> wrote: >>> >>> >>> Nope, there's nothing in Solr that crawls anything, you have to feed >>> documents in yourself from the websites. >>> >>> Or, look at the Nutch project, see: http://nutch.apache.org/about.html >>> >>> which is designed for this kind of problem. >>> >>> Best >>> Erick >>> >>> On Sun, Sep 11, 2011 at 8:53 PM, dpt9876 <daninthetrop...@gmail.com> >> wrote: >>>> Hi all, >>>> I am wondering if Solr will do the following for a project I am working >> on. >>>> I want to create a search engine with facets for potentially hundreds of >>>> websites. >>>> Similar to say crawling amazon + buy.com + ebay and someone can search >> these >>>> 3 sites from my 1 website. >>>> (I realise there are better ways of doing the above example, its for >>>> illustrative purposes). >>>> Eventually I would build that search crawl to index say 200 or 1000 >>>> merchants. >>>> Someone would come to my site and search for "digital camera". >>>> >>>> They would get results from all 3 indexes and hopefully dynamic facets eg >>>> Price $100-200 >>>> Price 200-300 >>>> Resolution 1mp-2mp >>>> >>>> etc etc >>>> >>>> Can this be done on the fly? >>>> >>>> I ask this because I am currently developing webscrapers to crawl these >>>> websites, dump that data into a db, then was thinking of tacking on a >> solr >>>> server to crawl my db. >>>> >>>> Problem with that approach is that crawling the worlds ecommerce sites >> will >>>> take forever, when it seems solr might do that for me? (I have read about >>>> multiple indexes etc). >>>> >>>> Many thanks >>>> >>>> -- >>>> View this message in context: >> http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html >>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>> >>> >>> >>> _______________________________________________ >>> If you reply to this email, your message will be added to the discussion >> below: >>> >> http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328340.html >>> >>> To unsubscribe from Will Solr/Lucene crawl multi websites (aka a mini >> google with faceted search)?, visit >> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3328314&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI4MzE0fC04MDk0NTc1ODg= >> >> >> -- >> View this message in context: http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328449.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > -------------------------- > Ken Krugler > +1 530-210-6378 > http://bixolabs.com > custom big data solutions & training > Hadoop, Cascading, Mahout & Solr > > > > > > _______________________________________________ > If you reply to this email, your message will be added to the discussion below: > http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328470.html > > To unsubscribe from Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?, visit http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3328314&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI4MzE0fC04MDk0NTc1ODg=
-- View this message in context: http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328937.html Sent from the Solr - User mailing list archive at Nabble.com.