Hi thanks for the reply.

How does nutch/solr handle the scenario where 1 website calls price, "price"
and another website calls it "cost". Same thing different name, yet I would
want the facet to handle that and not create a different facet.

Is this combo of nutch and Solr that intelligent and or intuitive?

Thanks for the fast response.
On Sep 12, 2011 9:06 AM, "Erick Erickson [via Lucene]" <
ml-node+s472066n3328340...@n3.nabble.com> wrote:
>
>
> Nope, there's nothing in Solr that crawls anything, you have to feed
> documents in yourself from the websites.
>
> Or, look at the Nutch project, see: http://nutch.apache.org/about.html
>
> which is designed for this kind of problem.
>
> Best
> Erick
>
> On Sun, Sep 11, 2011 at 8:53 PM, dpt9876 <daninthetrop...@gmail.com>
wrote:
>> Hi all,
>> I am wondering if Solr will do the following for a project I am working
on.
>> I want to create a search engine with facets for potentially hundreds of
>> websites.
>> Similar to say crawling amazon + buy.com + ebay and someone can search
these
>> 3 sites from my 1 website.
>> (I realise there are better ways of doing the above example, its for
>> illustrative purposes).
>> Eventually I would build that search crawl to index say 200 or 1000
>> merchants.
>> Someone would come to my site and search for "digital camera".
>>
>> They would get results from all 3 indexes and hopefully dynamic facets eg
>> Price $100-200
>> Price 200-300
>> Resolution 1mp-2mp
>>
>> etc etc
>>
>> Can this be done on the fly?
>>
>> I ask this because I am currently developing webscrapers to crawl these
>> websites, dump that data into a db, then was thinking of tacking on a
solr
>> server to crawl my db.
>>
>> Problem with that approach is that crawling the worlds ecommerce sites
will
>> take forever, when it seems solr might do that for me? (I have read about
>> multiple indexes etc).
>>
>> Many thanks
>>
>> --
>> View this message in context:
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
> _______________________________________________
> If you reply to this email, your message will be added to the discussion
below:
>
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328340.html
>
> To unsubscribe from Will Solr/Lucene crawl multi websites (aka a mini
google with faceted search)?, visit
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3328314&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI4MzE0fC04MDk0NTc1ODg=


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328449.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to