Re: [DISCUSS] Replacing MapReduce with Tez

2020-12-21 Thread Lewis John McGibbney
Hi dev@, I've documented my Tez journey so far at https://cwiki.apache.org/confluence/display/NUTCH/Running+Nutch+on+Tez Things are getting quite interesting. Please share any experiences using Nutch on Tez or improvements to the documentation especially any experiments you can document. Thank y

Re: RE: [DISCUSS] Replacing MapReduce with Tez

2020-12-21 Thread Lewis John McGibbney
Hi Markus, Thanks for chiming in :) My responses below On 2020/12/21 21:32:08, Markus Jelsma wrote: > Hello Lewis, > > 1. counters, for me they are a requirement to have as they are key to regular > inspections of ongoing crawls, finding errors and debugging. I hope you can > find a work arou

RE: [DISCUSS] Replacing MapReduce with Tez

2020-12-21 Thread Markus Jelsma
Hello Lewis, 1. counters, for me they are a requirement to have as they are key to regular inspections of ongoing crawls, finding errors and debugging. I hope you can find a work around. 2. sounds interesting, but i'd like to see the test run with 12M rather than 12k URLs. A question, are the

Re: [DISCUSS] Replacing MapReduce with Tez

2020-12-21 Thread Lewis John McGibbney
Hi dev@, Short update here. I've documented my initial observations running Nutch on Tez at https://s.apache.org/viee3 Specific early finding are as follows 1. Counters don't appear to work... which makes sense as all existing counters are manifested using the MapReduce framework. I'm not sure if