Thankyou for the clarification and help guys I will try them.
On Sep 12, 2011 10:29 AM, "kkrugler [via Lucene]" <
ml-node+s472066n332847...@n3.nabble.com> wrote:
>
>
>
> On Sep 11, 2011, at 7:04pm, dpt9876 wrote:
>
>> Hi thanks for the reply.
>>
>> How does nutch/solr handle the scenario where 1 website calls price,
"price"
>> and another website calls it "cost". Same thing different name, yet I
would
>> want the facet to handle that and not create a different facet.
>>
>> Is this combo of nutch and Solr that intelligent and or intuitive?
>
> What you're describing here is web mining, not web crawling.
>
> You want to extract price data from web pages, and put that into a
specific field in Solr.
>
> To do that using Nutch, you'd need to write custom plug-ins that know how
to extract the price from a page, and add that as a custom field to the
crawl results.
>
> The above is a topic for the Nutch mailing list, since Solr is just a
downstream consumer of whatever Nutch provides.
>
> -- Ken
>
>> On Sep 12, 2011 9:06 AM, "Erick Erickson [via Lucene]" <
>> ml-node+s472066n3328340...@n3.nabble.com> wrote:
>>>
>>>
>>> Nope, there's nothing in Solr that crawls anything, you have to feed
>>> documents in yourself from the websites.
>>>
>>> Or, look at the Nutch project, see: http://nutch.apache.org/about.html
>>>
>>> which is designed for this kind of problem.
>>>
>>> Best
>>> Erick
>>>
>>> On Sun, Sep 11, 2011 at 8:53 PM, dpt9876 <daninthetrop...@gmail.com>
>> wrote:
>>>> Hi all,
>>>> I am wondering if Solr will do the following for a project I am working
>> on.
>>>> I want to create a search engine with facets for potentially hundreds
of
>>>> websites.
>>>> Similar to say crawling amazon + buy.com + ebay and someone can search
>> these
>>>> 3 sites from my 1 website.
>>>> (I realise there are better ways of doing the above example, its for
>>>> illustrative purposes).
>>>> Eventually I would build that search crawl to index say 200 or 1000
>>>> merchants.
>>>> Someone would come to my site and search for "digital camera".
>>>>
>>>> They would get results from all 3 indexes and hopefully dynamic facets
eg
>>>> Price $100-200
>>>> Price 200-300
>>>> Resolution 1mp-2mp
>>>>
>>>> etc etc
>>>>
>>>> Can this be done on the fly?
>>>>
>>>> I ask this because I am currently developing webscrapers to crawl these
>>>> websites, dump that data into a db, then was thinking of tacking on a
>> solr
>>>> server to crawl my db.
>>>>
>>>> Problem with that approach is that crawling the worlds ecommerce sites
>> will
>>>> take forever, when it seems solr might do that for me? (I have read
about
>>>> multiple indexes etc).
>>>>
>>>> Many thanks
>>>>
>>>> --
>>>> View this message in context:
>>
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>
>>>
>>> _______________________________________________
>>> If you reply to this email, your message will be added to the discussion
>> below:
>>>
>>
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328340.html
>>>
>>> To unsubscribe from Will Solr/Lucene crawl multi websites (aka a mini
>> google with faceted search)?, visit
>>
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3328314&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI4MzE0fC04MDk0NTc1ODg=
>>
>>
>> --
>> View this message in context:
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328449.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> custom big data solutions & training
> Hadoop, Cascading, Mahout & Solr
>
>
>
>
>
> _______________________________________________
> If you reply to this email, your message will be added to the discussion
below:
>
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328470.html
>
> To unsubscribe from Will Solr/Lucene crawl multi websites (aka a mini
google with faceted search)?, visit
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3328314&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI4MzE0fC04MDk0NTc1ODg=


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328937.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to