@Julien & all Thx for the correction, @Ken , you know what I just got the book last week, and I'm in the process of reading it. And whilst I was reading it, I said oops my answer is wrong.
You guys corrected it, fine. I got to this conclusion because I only ever used a pseudo/distributed or a two server cluster and in those cases there is only one reducer. In the book it is recommended to have less reducer than nodes for optimisation reasons. @Marek, Although I had tried in the past, I never succeeded to get more reducers 2011/6/10 Ken Krugler <[email protected]> > > On Jun 10, 2011, at 7:50am, Marek Bachmann wrote: > > > Thanks to you all, > > > > so to get it on one point: Is it possible to speed up the map / reduce > task (what ever it exactly does) on a single quad core machine, and if so, > does anyone know a resource where I can get a little documentation? :-) > > Get "Hadoop: The Definitive Guide" by Tom White. > > And then set up your machine to run in pseudo-distributed mode. > > -- Ken > > > > > Thank you once again. > > > > Greetings, > > > > Marek > > > > On 10.06.2011 16:26, Julien Nioche wrote: > >> Raymond, > >> > >> Hadoop is using a map/reduce algorithm, the reduce phase is that phase > which > >>> collects the results from // execution. > >>> It is inherently not possible to parrallelized that phase. > >>> > >> > >> Sorry to contradict you Raymond but this is incorrect. You can specify > the > >> number of reducers to use e.g. > >> > >> -D mapred.reduce.tasks=$numTasks > >> > >> but obviously this will work only in (pseudo)distributed mode i.e. with > the > >> various Hadoop services running indepently of Nutch > >> > >> > >> > >> > >> > >> > >>> > >>> -Raymond- > >>> > >>> 2011/6/10 Marek Bachmann<[email protected]> > >>> > >>>> Hello again, > >>>> > >>>> I noticed that in the reduce phase only use one cpu core. This > processes > >>>> take very long time with 100 % usage but only on one core. Is there a > >>>> possibility to parallelise this processes on multiple cores on one > local > >>>> machine? Could using Hadoop help in some way? I have no experience > with > >>>> Hadoop at all. :-/ > >>>> > >>>> 11/06/10 14:38:21 INFO mapred.JobClient: map 100% reduce 94% > >>>> 11/06/10 14:38:23 INFO mapred.LocalJobRunner: reduce> reduce > >>>> 11/06/10 14:38:26 INFO mapred.LocalJobRunner: reduce> reduce > >>>> 11/06/10 14:38:29 INFO mapred.LocalJobRunner: reduce> reduce > >>>> 11/06/10 14:38:32 INFO mapred.LocalJobRunner: reduce> reduce > >>>> 11/06/10 14:38:35 INFO mapred.LocalJobRunner: reduce> reduce > >>>> 11/06/10 14:38:38 INFO mapred.LocalJobRunner: reduce> reduce > >>>> 11/06/10 14:38:41 INFO mapred.LocalJobRunner: reduce> reduce > >>>> 11/06/10 14:38:44 INFO mapred.LocalJobRunner: reduce> reduce > >>>> 11/06/10 14:38:47 INFO mapred.LocalJobRunner: reduce> reduce > >>>> 11/06/10 14:38:50 INFO mapred.LocalJobRunner: reduce> reduce > >>>> 11/06/10 14:38:53 INFO mapred.LocalJobRunner: reduce> reduce > >>>> 11/06/10 14:38:56 INFO mapred.LocalJobRunner: reduce> reduce > >>>> 11/06/10 14:38:57 INFO mapred.JobClient: map 100% reduce 95% > >>>> > >>>> > >>>> Here is a copy of top's output while running a reduce: > >>>> > >>>> top - 14:30:53 up 12 days, 33 min, 3 users, load average: 0.81, > 0.38, > >>>> 0.35 > >>>> Tasks: 123 total, 1 running, 122 sleeping, 0 stopped, 0 zombie > >>>> Cpu(s): 25.1%us, 0.2%sy, 0.0%ni, 74.8%id, 0.0%wa, 0.0%hi, 0.0%si, > >>>> 0.0%st > >>>> Mem: 8003904k total, 5762520k used, 2241384k free, 120180k > buffers > >>>> Swap: 418808k total, 4k used, 418804k free, 3713236k > cached > >>>> > >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > >>>> > >>>> 25835 root 20 0 4371m 1.6g 10m S 101 21.3 5:18.69 java > >>>> > >>>> Tank you > >>>> > >>> > >>> > >>> > >>> -- > >>> -MilleBii- > >>> > >> > >> > >> > > > > -------------------------- > Ken Krugler > +1 530-210-6378 > http://bixolabs.com > custom data mining solutions > > > > > > > -- -MilleBii-

