Re: Solr & Nutch

Markus Jelsma Tue, 28 Jan 2014 10:40:34 -0800

Short answer, you can't.rashmi maheshwari <maheshwari.ras...@gmail.com> 
schreef:Thanks All for quick response.


Today I crawled a webpage using nutch. This page have many links. But all
anchor tags have "href=#" and javascript is written on onClick event of
each anchor tag to open a new page.

So crawler didnt crawl any of those links which were opening using onClick
event and has # href value.

How these links are crawled using nutch?




On Tue, Jan 28, 2014 at 10:54 PM, Alexei Martchenko <
ale...@martchenko.com.br> wrote:

> 1) Plus, those files are binaries sometimes with metadata, specific
> crawlers need to understand them. html is a plain text
>
> 2) Yes, different data schemes. Sometimes I replicate the same core and
> make some A-B tests with different weights, filters etc etc and some people
> like to creare CoreA and CoreB with the same schema and hammer CoreA with
> updates and commits and optmizes, they make it available for searches while
> hammering CoreB. Then swap again. This produces faster searches.
>
>
> alexei martchenko
> Facebook <http://www.facebook.com/alexeiramone> |
> Linkedin<http://br.linkedin.com/in/alexeimartchenko>|
> Steam <http://steamcommunity.com/id/alexeiramone/> |
> 4sq<https://pt.foursquare.com/alexeiramone>| Skype: alexeiramone |
> Github <https://github.com/alexeiramone> | (11) 9 7613.0966 |
>
>
> 2014-01-28 Jack Krupansky <j...@basetechnology.com>
>
> > 1. Nutch follows the links within HTML web pages to crawl the full graph
> > of a web of pages.
> >
> > 2. Think of a core as an SQL table - each table/core has a different type
> > of data.
> >
> > 3. SolrCloud is all about scaling and availability - multiple shards for
> > larger collections and multiple replicas for both scaling of query
> response
> > and availability if nodes go down.
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: rashmi maheshwari
> > Sent: Tuesday, January 28, 2014 11:36 AM
> > To: solr-user@lucene.apache.org
> > Subject: Solr & Nutch
> >
> >
> > Hi,
> >
> > Question1 --> When Solr could parse html, documents like doc, excel pdf
> > etc, why do we need nutch to parse html files? what is different?
> >
> > Questions 2: When do we use multiple core in solar? any practical
> business
> > case when we need multiple cores?
> >
> > Question 3: When do we go for cloud? What is meaning of implementing solr
> > cloud?
> >
> >
> > --
> > Rashmi
> > Be the change that you want to see in this world!
> > www.minnal.zor.org
> > disha.resolve.at
> > www.artofliving.org
> >
>



-- 
Rashmi
Be the change that you want to see in this world!
www.minnal.zor.org
disha.resolve.at
www.artofliving.org

Re: Solr & Nutch

Reply via email to