Re: Importance of Score

2014-05-25 Thread Talat Uyarer
Hi Vangelis, In Nutch 2.x we use partitioner for distrubiting urls. in reduce of generatorjob we take only topN/recude count urls. We don't choose random by default but we don't take with highest score. Am i wrong Sebastian ? Talat 22 May 2014 18:59 tarihinde Vangelis karv

Re: Importance of Score

2014-05-24 Thread Sebastian Nagel
for selection Domains (hosts) at the start of a region (mapper input) have the highest chance to get selected. I guess that the first line is wrong and should be updated. Date: Thu, 22 May 2014 21:28:10 +0200 From: wastl.na...@googlemail.com To: user@nutch.apache.org Subject: Re: Importance

RE: Importance of Score

2014-05-23 Thread Vangelis karv
is wrong and should be updated. Date: Thu, 22 May 2014 21:28:10 +0200 From: wastl.na...@googlemail.com To: user@nutch.apache.org Subject: Re: Importance of Score Hi Vangelis, Does it choose Urls with the highest score Yes, it does. Have a look at generatorSortValue(...) in one

Importance of Score

2014-05-22 Thread Vangelis karv
(Apache Nutch 2.2.1) Hi again! GeneratorJob marks the best topN sites for fetching. Does it choose Urls with the highest score or random Urls? If it chooses randomly, then whats the point of the score field?? Thank you!

Re: Importance of Score

2014-05-22 Thread Sebastian Nagel
Hi Vangelis, Does it choose Urls with the highest score Yes, it does. Have a look at generatorSortValue(...) in one the scoring filter plugins. In case of scoring-opic (activated per default), URLs/docs are simply ranked by score taken from CrawlDb. But other scoring filters may use different