1) Plus, those files are binaries sometimes with metadata, specific
crawlers need to understand them. html is a plain text

2) Yes, different data schemes. Sometimes I replicate the same core and
make some A-B tests with different weights, filters etc etc and some people
like to creare CoreA and CoreB with the same schema and hammer CoreA with
updates and commits and optmizes, they make it available for searches while
hammering CoreB. Then swap again. This produces faster searches.


alexei martchenko
Facebook <http://www.facebook.com/alexeiramone> |
Linkedin<http://br.linkedin.com/in/alexeimartchenko>|
Steam <http://steamcommunity.com/id/alexeiramone/> |
4sq<https://pt.foursquare.com/alexeiramone>| Skype: alexeiramone |
Github <https://github.com/alexeiramone> | (11) 9 7613.0966 |


2014-01-28 Jack Krupansky <j...@basetechnology.com>

> 1. Nutch follows the links within HTML web pages to crawl the full graph
> of a web of pages.
>
> 2. Think of a core as an SQL table - each table/core has a different type
> of data.
>
> 3. SolrCloud is all about scaling and availability - multiple shards for
> larger collections and multiple replicas for both scaling of query response
> and availability if nodes go down.
>
> -- Jack Krupansky
>
> -----Original Message----- From: rashmi maheshwari
> Sent: Tuesday, January 28, 2014 11:36 AM
> To: solr-user@lucene.apache.org
> Subject: Solr & Nutch
>
>
> Hi,
>
> Question1 --> When Solr could parse html, documents like doc, excel pdf
> etc, why do we need nutch to parse html files? what is different?
>
> Questions 2: When do we use multiple core in solar? any practical business
> case when we need multiple cores?
>
> Question 3: When do we go for cloud? What is meaning of implementing solr
> cloud?
>
>
> --
> Rashmi
> Be the change that you want to see in this world!
> www.minnal.zor.org
> disha.resolve.at
> www.artofliving.org
>

Reply via email to