Indexing Metatags

2014-05-23 Thread michael
I am using Nutch 1.8 with Solr 4.3 and I want to index two custom meta tags that we have on our site. I have followed the tutorial at http://wiki.apache.org/nutch/IndexMetatags but I cannot get it to work. If I run parsechecker, it shows that the fields are being parsed, but if I run indexchec

RE: Re-crawl every 24 hours

2014-05-23 Thread Markus Jelsma
That will work, but use nutch.fetchInterval.fixed in case you use an adaptive fetch scheduler. -Original message- > From:Julien Nioche > Sent: Friday 23rd May 2014 12:09 > To: user@nutch.apache.org > Subject: Re: Re-crawl every 24 hours > > Hi > > This will work with 1.8 indeed. Wh

Re: Re-crawl every 24 hours

2014-05-23 Thread Ali rahmani
Hi Julien,  Would you please guide me how a re-Crawling Script should be. I pass following steps(even after adding fetch.interval parameter), crawler goes deep and deeper.  1) ./nutch Inject /url 2)Loop{ ./nutch generate -topN 2000 ./nutch fetch [CrwalID] ./nutch parse [CrawlID] ./nutch generated

Pull in data from database (RDBMS)

2014-05-23 Thread Bayu Widyasanyata
Hi, Anyone could pointing me on documentation how to pull in (fetching) data from database (e.g. common RDBMS such MySQL, etc.) with nutch? While the rest of process are nutch commons: parse and index them. Thanks in advance. -- wassalam, [bayu]

Re: Re-crawl every 24 hours

2014-05-23 Thread Julien Nioche
Hi This will work with 1.8 indeed. What procedure do you mean? Just add nutch.fetchInterval to the seeds, that's all. J. On 23 May 2014 10:13, Ali Nazemian wrote: > Dear Julien, > Hi, > Do you know any step by step guide for this procedure? Is this the same for > nutch 1.8? > Best regards. >

Re: Re-crawl every 24 hours

2014-05-23 Thread Ali Nazemian
Dear Julien, Hi, Do you know any step by step guide for this procedure? Is this the same for nutch 1.8? Best regards. On Wed, May 21, 2014 at 6:43 PM, Julien Nioche < lists.digitalpeb...@gmail.com> wrote: > > db.fetch.interval.default > 1800 > The default number of seconds between re-fetc

RE: Importance of Score

2014-05-23 Thread Vangelis karv
Thanks Sebastian for your trouble! In http://wiki.apache.org/nutch/Nutch2Crawling , just before the Fetch procedure, it says: Cons: Scoring is not used for selection Domains (hosts) at the start of a region (mapper input) have the highest chance to get selected. I guess that the first line i