Re: Desicion Tree Implementation in Hadoop MapReduce

2013-12-01 Thread unmesha sreeveni
To make it more general, it's better to separate them. Since there might be multiple batches of training (or to-be-label), and you only need to train the model once (if your data is stable). Ok , I will go for the second one. So if we are going for separate.They will not have any connection with b

Re: Desicion Tree Implementation in Hadoop MapReduce

2013-12-01 Thread Yexi Jiang
Actually the training and testing (or prediction) are not necessary to be done in one shot. If you need to do them consecutively in your particular scenario, you can do it as what you said. To make it more general, it's better to separate them. Since there might be multiple batches of training (or

Re: Desicion Tree Implementation in Hadoop MapReduce

2013-12-01 Thread unmesha sreeveni
1. I jst thought of building a model using a project named say DT and wen a huge input comes do another mr job test.java with in DT. If not chaining jobs we need to create seperate project right DT_build and DT_test projects NO need for seperate project file? 2. M1_train - dataset for training. M1

Hadoop 2.2.0 from source configuration

2013-12-01 Thread Daniel Savard
I am trying to configure hadoop 2.2.0 from source code and I found the instructions really crappy and incomplete. It is like they were written to avoid someone can do the job himself and must contract someone else to do it or buy a packaged version. It is about three days I am struggling with this

Re: Desicion Tree Implementation in Hadoop MapReduce

2013-12-01 Thread Yexi Jiang
What is your motivation of using chaining jobs? 2013/12/1 unmesha sreeveni > Thanks Yexi...A very nice explanation...Thanks a lot.. > Explained in a very simple way which is really understandable for > beginners..Thanks a lot. > I can go for chaining jobs right? > > > > > > On Sun, Dec 1, 2013

Re: Desicion Tree Implementation in Hadoop MapReduce

2013-12-01 Thread unmesha sreeveni
Thanks Yexi...A very nice explanation...Thanks a lot.. Explained in a very simple way which is really understandable for beginners..Thanks a lot. I can go for chaining jobs right? On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang wrote: > In my opinion. > > 1. Build the decision tree model with the

Jt ha issue on cdh4

2013-12-01 Thread Siddharth Tiwari
I implemented jt ha on cdh4.4.2 . Jobtracker keeps on failing over to each other, job keeps restarting, also namenode goes down at times and I can see logs for few datanodes mentioning all data nodes are bad. aborting. I installed jt ha manually like this :- After configuring jt ha i started j

YARN: LocalResources and file distribution

2013-12-01 Thread Robert Metzger
Hello, I'm currently writing code to run my application using Yarn (Hadoop 2.2.0). I used this code as a skeleton: https://github.com/hortonworks/simple-yarn-app Everything works fine on my local machine or on a cluster with the shared directories, but when I want to access resources outside of c

Re: Desicion Tree Implementation in Hadoop MapReduce

2013-12-01 Thread Yexi Jiang
In my opinion. 1. Build the decision tree model with the training data. 2. Store it somewhere. 3. When the unlabeled data is available: 3.1 if the unlabeled data is huge, write another mrjob to process them, load the model at the setup stage, use the model to label the data one by one in map st

Re: Desicion Tree Implementation in Hadoop MapReduce

2013-12-01 Thread unmesha sreeveni
Thanks Yexi , But how it can be accomplished. The input to Desicion Tree MR will be a set of data. But while predicting a data it will be a one line data without classlabel right? So what changes will be there in mrjob.Should we design like this. 1. When a set of data is coming draw Desicion tree

Implementing and running an applicationmaster

2013-12-01 Thread Yue Wang
Hi, I found the page ( http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html) and know how to write an ApplicationMaster. However, is there a complete example showing how to run this ApplicationMaster with a real Hadoop Program (e.g. WordCount) on YARN? T