That is a really old paper that basically pre-dates all of the recent important work in neural networks.
You should look for works on Rectified Linear Units (ReLU), drop-out regularization, parameter servers (downpour sgd) and deep learning. Map-reduce as you have used it will not produce interesting results because the overhead of map-reduce will be far too high. Here are some references: http://www.cs.toronto.edu/~ranzato/publications/DistBeliefNIPS2012_withAppendix.pdf http://arxiv.org/abs/1412.5567 http://arxiv.org/abs/1502.01710 http://www.comp.nus.edu.sg/~dbsystem/singa/ http://0xdata.com/product/deep-learning/ On Thu, Feb 12, 2015 at 2:14 AM, unmesha sreeveni <unmeshab...@gmail.com> wrote: > I am trying to implement Neural Network in MapReduce. Apache mahout is > reffering this paper > < > http://www.cs.stanford.edu/people/ang/papers/nips06-mapreducemulticore.pdf > > > > Neural Network (NN) We focus on backpropagation By defining a network > structure (we use a three layer network with two output neurons classifying > the data into two categories), each mapper propagates its set of data > through the network. For each training example, the error is back > propagated to calculate the partial gradient for each of the weights in the > network. The reducer then sums the partial gradient from each mapper and > does a batch gradient descent to update the weights of the network. > > Here <http://homepages.gold.ac.uk/nikolaev/311sperc.htm> is the worked out > example for gradient descent algorithm. > > Gradient Descent Learning Algorithm for Sigmoidal Perceptrons > <http://pastebin.com/6gAQv5vb> > > 1. Which is the better way to parallize neural network algorithm While > looking in MapReduce perspective? In mapper: Each Record owns a partial > weight(from above example: w0,w1,w2),I doubt if w0 is bias. A random > weight > will be assigned initially and initial record calculates the output(o) > and > weight get updated , second record also find the output and deltaW is > got > updated with the previous deltaW value. While coming into reducer the > sum > of gradient is calculated. ie if we have 3 mappers,we will be able to > get 3 > w0,w1,w2.These are summed and using batch gradient descent we will be > updating the weights of the network. > 2. In the above method how can we ensure that which previous weight is > taken while considering more than 1 map task.Each map task has its own > weight updated.How can it be accurate? [image: enter image description > here] > 3. Where can I find backward propogation in the above mentioned gradient > descent neural network algorithm?Or is it fine with this implementation? > 4. what is the termination condition mensioned in the algorithm? > > Please help me with some pointers. > > Thanks in advance. > > -- > *Thanks & Regards * > > > *Unmesha Sreeveni U.B* > *Hadoop, Bigdata Developer* > *Centre for Cyber Security | Amrita Vishwa Vidyapeetham* > http://www.unmeshasreeveni.blogspot.in/ >