hadoop question using VMWARE

2011-09-28 Thread praveenesh kumar
Hi, Suppose I am having 10 windows machines and if I have 10 VM individual instances running on these machines independently, can I use these VM instances to communicate with each other so that I can make hadoop cluster using those VM instances. Did anyone tried that thing ? I know we can setup

Re: hadoop question using VMWARE

2011-09-28 Thread N Keywal
Hi, Yes, it will work. HBase won't see the difference, it's a pure vmware stuff. Obviously, it's not something you can do for production nor performance analysis. Cheers, N. On Wed, Sep 28, 2011 at 8:38 AM, praveenesh kumar praveen...@gmail.comwrote: Hi, Suppose I am having 10 windows

Re: hadoop question using VMWARE

2011-09-28 Thread praveenesh kumar
it's not something you can do for production nor performance analysis. Can you please tell me what does it mean ? Why Can't we use this approach for production ??? Thanks On Tue, Sep 27, 2011 at 11:56 PM, N Keywal nkey...@gmail.com wrote: Hi, Yes, it will work. HBase won't see the

Re: hadoop question using VMWARE

2011-09-28 Thread N Keywal
For example: - It's adding two layers (windows linux), that can both fail, especially under heavy workload (and hadoop is built to use all the resources available). They will need to be managed as well (software upgrades, hardware support...), it's an extra cost. - These two layers will use

Re: difference between development and production platform???

2011-09-28 Thread Steve Loughran
On 28/09/11 04:19, Hamedani, Masoud wrote: Special Thanks for your help Arko, You mean in Hadoop, NameNode, DataNodes, JobTracker, TaskTrackers and all the clusters should deployed on Linux machines??? We have lots of data (on windows OS) and code (written in C#) for data mining, we wana to use

Re: hadoop question using VMWARE

2011-09-28 Thread Steve Loughran
On 28/09/11 08:37, N Keywal wrote: For example: - It's adding two layers (windows linux), that can both fail, especially under heavy workload (and hadoop is built to use all the resources available). They will need to be managed as well (software upgrades, hardware support...), it's an extra

getting the process id of mapreduce tasks

2011-09-28 Thread bikash sharma
Hi, Is it possible to get the process id of each task in a MapReduce job? When I run a mapreduce job and do a monitoring in linux using ps, i just see the id of the mapreduce job process but not its constituent map/reduce tasks. The use case is to monitor the resource usage of each task by using

dump configuration

2011-09-28 Thread patrick sang
Hi hadoopers, I was looking the way to dump hadoop configuration in order to check if what i have just changed in mapred-site.xml is really kicked in. Found that HADOOP-6184 https://issues.apache.org/jira/browse/HADOOP-6184is exactly what i want but the thing is I am running CDH3u0 which is

RE: dump configuration

2011-09-28 Thread GOEKE, MATTHEW (AG/1000)
You could always check the web-ui job history for that particular run, open the job.xml, and search for what the value of that parameter was at runtime. Matt -Original Message- From: patrick sang [mailto:silvianhad...@gmail.com] Sent: Wednesday, September 28, 2011 4:00 PM To:

Re: dump configuration

2011-09-28 Thread Raj V
The xml configuration file is also available under hadoop logs on the jobtracker. Raj From: GOEKE, MATTHEW (AG/1000) matthew.go...@monsanto.com To: common-user@hadoop.apache.org common-user@hadoop.apache.org Sent: Wednesday, September 28, 2011 2:27 PM Subject:

Hadoop performance benchmarking with TestDFSIO

2011-09-28 Thread Sameer Farooqui
Hi everyone, I'm looking for some recommendations for how to get our Hadoop cluster to do faster I/O. Currently, our lab cluster is 8 worker nodes and 1 master node (with NameNode and JobTracker). Each worker node has: - 48 GB RAM - 16 processors (Intel Xeon E5630 @ 2.53 GHz) - 1 Gb Ethernet

Running multiple MR Job's in sequence

2011-09-28 Thread Aaron Baff
Is it possible to submit a series of MR Jobs to the JobTracker to run in sequence (one finishes, take the output of that if successful and feed it into the next, etc), or does it need to run client side by using the JobControl or something like Oozie, or rolling our own? What I'm looking for is

failed to compile 0.20.3

2011-09-28 Thread Nan Zhu
Hi, all I met a problem when compling the hadoop-0.20.3 I modified some codes in JobTracker, and then compile it, the eclipse tells me that /Users/zhunan/codes/hadoop-0.20.203.0/build/src/org/apache/hadoop/mapred/jobfailures_jsp.java:13 org.apache.hadoop.mapred.JSPUtil does't exist and I just

Re: Running multiple MR Job's in sequence

2011-09-28 Thread Raj V
Can't this be done with a simple shell script? Raj From: Aaron Baff aaron.b...@telescope.tv To: common-user@hadoop.apache.org common-user@hadoop.apache.org Sent: Wednesday, September 28, 2011 4:56 PM Subject: Running multiple MR Job's in sequence Is it

Re: getting the process id of mapreduce tasks

2011-09-28 Thread Varad Meru
The process ids of each individual task can be seen using jps and jconsole commands provided by java. jconsole command on command-line interface provides a GUI screen for monitoring running tasks within java. The tasks are only visible as java virtual machine instance in the os system

Re: Running multiple MR Job's in sequence

2011-09-28 Thread Arko Provo Mukherjee
Hi, The way I did it is have multiple JobConfs and running them one after the another in the program as per the logic. The setOutputPath in the previous job can be setInputPath in the next one if you want to take the output from the previous job and feed it as input to the next. Thanks regards

Re: difference between development and production platform???

2011-09-28 Thread Hamedani, Masoud
Dear Steve, thanks for your useful comments, I completely agree with your idea, personally its more than 10 years that im only using Fedora, java, java related techs, and open source software in all of my projects, but this is a critical situation, all of current data and apps in our univ's lab

Re: Running multiple MR Job's in sequence

2011-09-28 Thread Harsh J
Within the Hadoop core project, there is JobControl you can utilize for this. You can view its API at http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/jobcontrol/package-summary.html and it is fairly simple to use (Create jobs in regular java API, build a dependency flow

Re: getting the process id of mapreduce tasks

2011-09-28 Thread Harsh J
Hello Bikash, The tasks run on the tasktracker, so that is where you'll need to look for the process ID -- not the JobTracker/client. Crudely speaking, $ ssh tasktracker01 # or whichever. $ jps | grep Child | cut -d -f 1 # And lo, PIDs to play with. On Thu, Sep 29, 2011 at 12:15 AM, bikash

Re: Problems with Rumen in Hadoop-0.21.0

2011-09-28 Thread starjoy0530
hello! I am also a new hadoop user. And I meet the same problem with you. I don't know how to solve the invalid file name of rumen. my error is said WARN rumen.TraceBuilder: File skipped: Invalid file name: job_201109221644_0001_username If you have resolved your problem, can you help me? :) my