Re: Nutch survey

2014-05-22 Thread Jorge Luis Betancourt Gonzalez
Done! On May 21, 2014, at 11:56 PM, Bayu Widyasanyata bwidyasany...@gmail.com wrote: Done! Great Julien! On Wed, May 21, 2014 at 10:58 PM, Markus Jelsma markus.jel...@openindex.iowrote: Great! Done! :-)Julien Nioche lists.digitalpeb...@gmail.com schreef:Hi everyone! I had written

Re: Nutch survey

2014-05-22 Thread Talat Uyarer
Great Julien, Done. 2014-05-22 9:34 GMT+03:00 Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu: Done! On May 21, 2014, at 11:56 PM, Bayu Widyasanyata bwidyasany...@gmail.com wrote: Done! Great Julien! On Wed, May 21, 2014 at 10:58 PM, Markus Jelsma markus.jel...@openindex.iowrote:

Re: Nutch survey

2014-05-22 Thread Julien Nioche
Hi guys Thanks to all who have participated so far. You should be able to edit your answers if you want to, in particular I have added a few fields that were missing at the beginning. So if you have participated in the survey, feel free to have another look at it and add any missing details. For

Importance of Score

2014-05-22 Thread Vangelis karv
(Apache Nutch 2.2.1) Hi again! GeneratorJob marks the best topN sites for fetching. Does it choose Urls with the highest score or random Urls? If it chooses randomly, then whats the point of the score field?? Thank you!

Re: Importance of Score

2014-05-22 Thread Sebastian Nagel
Hi Vangelis, Does it choose Urls with the highest score Yes, it does. Have a look at generatorSortValue(...) in one the scoring filter plugins. In case of scoring-opic (activated per default), URLs/docs are simply ranked by score taken from CrawlDb. But other scoring filters may use different

Re: Nutch survey

2014-05-22 Thread Mattmann, Chris A (3980)
Will do I will fill it out ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-5th floor Email:

Why is fetcher one big class?

2014-05-22 Thread Diaa Abdallah
Currently the fetcher class is a 1,500 line piece of code. I'd like to suggest splitting it up to multiple files to improve readability and maintainability of the code instead of this one big class with many nested classes. The classes are grouped anyways by the fetcher namespace so having them

Re: Why is fetcher one big class?

2014-05-22 Thread anupamk
Probably a good place to discuss this is the nutch-dev mailing list, not the nutch-user mailing list. -- View this message in context: http://lucene.472066.n3.nabble.com/Why-is-fetcher-one-big-class-tp4137758p4137773.html Sent from the Nutch - User mailing list archive at Nabble.com.