Re: Making Mumak work with capacity scheduler

2011-09-21 Thread ArunKumar
Hi Uma ! Mumak is not part of stable versions yet. It comes from Hadoop-0.21 onwards. Can u describe in detail "You may need to merge them logically ( back port them)" ? I don't get it . Arun On Wed, Sep 21, 2011 at 12:07 PM, Uma Maheswara Rao G [via Lucene] < ml-node+s472066n3354668...@n3.nabb

Re: Making Mumak work with capacity scheduler

2011-09-21 Thread Uma Maheswara Rao G 72686
Hello Arun, If you want to apply MAPREDUCE-1253 on 21 version, applying patch directly using commands may not work because of codebase changes. So, you take the patch and apply the lines in your code base manually. I am not sure any otherway for this. Did i understand wrongly your intenti

Any other way to copy to HDFS ?

2011-09-21 Thread praveenesh kumar
Guys, As far as I know hadoop, I think, to copy the files to HDFS, first it needs to be copied to the NameNode's local filesystem. Is it right ?? So does it mean that even if I have a hadoop cluster of 10 nodes with overall capacity of 6TB, but if my NameNode's hard disk capacity is 500 GB, I can

Re: Any other way to copy to HDFS ?

2011-09-21 Thread Uma Maheswara Rao G 72686
Hi, You need not copy the files to NameNode. Hadoop provide Client code as well to copy the files. To copy the files from other node ( non dfs), you need to put the hadoop**.jar's into classpath and use the below code snippet. FileSystem fs =new DistributedFileSystem(); fs.initialize("NAMENO

Re: Any other way to copy to HDFS ?

2011-09-21 Thread Uma Maheswara Rao G 72686
For more understanding the flows, i would recommend you to go through once below docs http://hadoop.apache.org/common/docs/r0.16.4/hdfs_design.html#The+File+System+Namespace Regards, Uma - Original Message - From: Uma Maheswara Rao G 72686 Date: Wednesday, September 21, 2011 2:36 pm Sub

Re: Any other way to copy to HDFS ?

2011-09-21 Thread praveenesh kumar
So I want to copy the file from windows machine to linux namenode. How can I define NAMENODE_URI in the code you mention, if I want to copy data from windows machine to namenode machine ? Thanks, Praveenesh On Wed, Sep 21, 2011 at 2:37 PM, Uma Maheswara Rao G 72686 < mahesw...@huawei.com> wrote:

Re: Any other way to copy to HDFS ?

2011-09-21 Thread Uma Maheswara Rao G 72686
When you start the NameNode in Linux Machine, it will listen on one address.You can configure that address in NameNode by using fs.default.name. >From the clients, you can give this address to connect to your NameNode. initialize API will take URI and configuration. Assume if your NameNode is

Fwd: Any other way to copy to HDFS ?

2011-09-21 Thread praveenesh kumar
Thanks a lot. I am trying to run the following code on my windows machine that is not part of cluster. ** *public* *static* *void* main(String args[]) *throws* IOException, URISyntaxException { FileSystem fs =*new* DistributedFileSystem(); fs.initialize(*new* URI("hdfs://162.192.100.53:54310/"),

Re: risks of using Hadoop

2011-09-21 Thread Steve Loughran
On 20/09/11 22:52, Michael Segel wrote: PS... There's this junction box in your machine room that has this very large on/off switch. If pulled down, it will cut power to your cluster and you will lose everything. Now would you consider this a risk? Sure. But is it something you should really

Re: Fwd: Any other way to copy to HDFS ?

2011-09-21 Thread Uma Maheswara Rao G 72686
Hello Praveenesh, If you really need not care about permissions then you can disable it at NN side by using the property dfs.permissions.enable You can the permission for the path before creating as well. from docs: Changes to the File System API All methods that use a path parameter will throw

Re: risks of using Hadoop

2011-09-21 Thread Dieter Plaetinck
On Wed, 21 Sep 2011 11:21:01 +0100 Steve Loughran wrote: > On 20/09/11 22:52, Michael Segel wrote: > > > PS... There's this junction box in your machine room that has this > > very large on/off switch. If pulled down, it will cut power to your > > cluster and you will lose everything. Now would

Re: Fwd: Any other way to copy to HDFS ?

2011-09-21 Thread praveenesh kumar
Thanks a lot..!! I guess I can play around with the permissions of dfs for a while. On Wed, Sep 21, 2011 at 3:59 PM, Uma Maheswara Rao G 72686 < mahesw...@huawei.com> wrote: > Hello Praveenesh, > > If you really need not care about permissions then you can disable it at NN > side by using the pro

Re: risks of using Hadoop

2011-09-21 Thread Steve Loughran
On 21/09/11 11:30, Dieter Plaetinck wrote: On Wed, 21 Sep 2011 11:21:01 +0100 Steve Loughran wrote: On 20/09/11 22:52, Michael Segel wrote: PS... There's this junction box in your machine room that has this very large on/off switch. If pulled down, it will cut power to your cluster and you w

Re: Fwd: Any other way to copy to HDFS ?

2011-09-21 Thread Harsh J
Praveenesh, It should be understood, as a takeaway from this, that HDFS is a set of servers, like webservers are. You can send it a request, and you can expect a response. It is also an FS in the sense that it is designed to do FS like operations (hold inodes, read/write data), but primally it beh

Can we run job on some datanodes ?

2011-09-21 Thread praveenesh kumar
Is there any way that we can run a particular job in a hadoop on subset of datanodes ? My problem is I don't want to use all the nodes to run some job, I am trying to make Job completion Vs No. of nodes graph for a particular job. One way to do is I can remove datanodes, and then see how much time

Re: Can we run job on some datanodes ?

2011-09-21 Thread Harsh J
Praveenesh, TaskTrackers run your jobs' tasks for you, not DataNodes directly. So you can statically control loads on nodes by removing away TaskTrackers from your cluster. i.e, if you "service hadoop-0.20-tasktracker stop" or "hadoop-daemon.sh stop tasktracker" on the specific nodes, jobs won't

RE: risks of using Hadoop

2011-09-21 Thread Tom Deutsch
I am truly sorry if at some point in your life someone dropped an IBM logo on your head and it left a dent - but you are being a jerk. Right after you were engaging in your usual condescension a person from Xerox posted on the very issue you were blowing off. Things happen. To any system. I'm

Re: Can we run job on some datanodes ?

2011-09-21 Thread praveenesh kumar
Oh wow.. I didn't know that.. Actually for me datanodes/tasktrackers are running on same machines. I mention datanodes because if I delete those machines from masters list, chances are the data will also loose. So I don't want to do that.. but now I guess by stoping tasktrackers individually... I c

Re: Can we run job on some datanodes ?

2011-09-21 Thread Harsh J
Praveenesh, Absolutely right. Just stop them individually :) On Wed, Sep 21, 2011 at 6:53 PM, praveenesh kumar wrote: > Oh wow.. I didn't know that.. > Actually for me datanodes/tasktrackers are running on same machines. > I mention datanodes because if I delete those machines from masters list,

Problem with MR job

2011-09-21 Thread George Kousiouris
Hi all, We are trying to run a mahout job in a hadoop cluster, but we keep getting the same status. The job passes the initial mahout stages and when it comes to be executed as a MR job, it seems to be stuck at 0% progress. Through the UI we see that it is submitted but not running. After a

Re: Problem with MR job

2011-09-21 Thread Harsh J
Hello George, Have you looked at your DFS health page (http://NN:50070/)? I believe you have missing or fallen DataNode instances. I'd start them back up, after checking their (DataNode's) logs to figure out why they died. On Wed, Sep 21, 2011 at 7:28 PM, George Kousiouris wrote: > > Hi all, >

Re: Problem with MR job

2011-09-21 Thread Uma Maheswara Rao G 72686
Hi, Any cluster restart happend? ..is your NameNode detecting DataNodes as live? Looks DNs did not report anyblocks to NN yet. You have 13 blocks persisted in NameNode namespace. At least 12 blocks should be reported from your DNs. Other wise automatically it will not come out of safemode. Re

Re: Problem with MR job

2011-09-21 Thread George Kousiouris
Hi, The status seems healthy and the datanodes live: Status: HEALTHY Total size:118805326 B Total dirs:31 Total files:38 Total blocks (validated):38 (avg. block size 3126455 B) Minimally replicated blocks:38 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicat

Re: Problem with MR job

2011-09-21 Thread George Kousiouris
Hi, Some more logs, specifically from the JobTracker: 2011-09-21 10:22:43,482 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201109211018_0001 2011-09-21 10:22:43,538 ERROR org.apache.hadoop.mapred.JobHistory: Failed creating job history log file for job job_201109211018_0001 j

Re: Can we run job on some datanodes ?

2011-09-21 Thread Robert Evans
Praveen, If you are doing performance measurements be aware that having more datanodes then tasktrackers will impact the performance as well (Don't really know for sure how). It will not be the same performance as running on a cluster with just fewer nodes over all. Also if you do shut off da

Re: Using HBase for real time transaction

2011-09-21 Thread Jignesh Patel
On Sep 20, 2011, at 10:06 PM, Jean-Daniel Cryans wrote: >> I think there has to be some clarification. >> >> The OP was asking about a mySQL replacement. >> HBase will never be a RDBMS replacement. No Transactions means no way of >> doing OLTP. >> Its the wrong tool for that type of work. > >

Re: Problem with MR job

2011-09-21 Thread Uma Maheswara Rao G 72686
Can you check your DN data directories once, whether the blocks present or not? Can you give the DN and NN logs. Please put them in some site and share the link here. Regards, Uma - Original Message - From: George Kousiouris Date: Wednesday, September 21, 2011 8:06 pm Subject: Re: Probl

Re: risks of using Hadoop

2011-09-21 Thread Kobina Kwarko
Jignesh, Will your point 2 still be valid if we hire very experienced Java programmers? Kobina. On 20 September 2011 21:07, Jignesh Patel wrote: > > @Kobina > 1. Lack of skill set > 2. Longer learning curve > 3. Single point of failure > > > @Uma > I am curious to know about .20.2 is that stab

Re: risks of using Hadoop

2011-09-21 Thread Uma Maheswara Rao G 72686
Jignesh, Please see my comments inline. - Original Message - From: Kobina Kwarko Date: Wednesday, September 21, 2011 9:33 pm Subject: Re: risks of using Hadoop To: common-user@hadoop.apache.org > Jignesh, > > Will your point 2 still be valid if we hire very experienced Java > programmer

Re: risks of using Hadoop

2011-09-21 Thread Ahmed Nagy
Another way to decrease the risks is just to use Amazon Web Services. That might be a bit expensive On Sun, Sep 18, 2011 at 12:11 AM, Brian Bockelman wrote: > > > On Sep 16, 2011, at 11:08 PM, Uma Maheswara Rao G 72686 wrote: > > > Hi Kobina, > > > > Some experiences which may helpful for you wi

RE: risks of using Hadoop

2011-09-21 Thread Michael Segel
Tom, Normally someone who has a personal beef with someone will take it offline and deal with it. Clearly manners aren't your strong point... unfortunately making me respond to you in public. Since you asked, no, I don't have any beefs with IBM. In fact, I happen to have quite a few friends w

RE: risks of using Hadoop

2011-09-21 Thread Michael Segel
Kobina The points 1 and 2 are definitely real risks. SPOF is not. As I pointed out in my mini-rant to Tom was that your end users / developers who use the cluster can do more harm to your cluster than a SPOF machine failure. I don't know what one would consider a 'long learning curve'. With t

How to get hadoop job information effectively?

2011-09-21 Thread Benyi Wang
I'm working a project to collect MapReduce job information on an application level. For example, a DW ETL process may involves several MapReduce jobs, we want to have a dashboard to show the progress of those jobs for the specific ETL process. JobStatus does not provide all information like JobTra

RE: risks of using Hadoop

2011-09-21 Thread GOEKE, MATTHEW (AG/1000)
I would completely agree with Mike's comments with one addition: Hadoop centers around how to manipulate the flow of data in a way to make the framework work for your specific problem. There are recipes for common problems but depending on your domain that might solve only 30-40% of your use cas

Re: Using HBase for real time transaction

2011-09-21 Thread Jean-Daniel Cryans
On Wed, Sep 21, 2011 at 8:36 AM, Jignesh Patel wrote: >  I am not looking for relational database. But looking creating multi tenant > database, now at this time I am not sure whether it needs transactions or not > and even that kind of architecture can support transactions. Currently in HBase

Re: risks of using Hadoop

2011-09-21 Thread Shi Yu
I saw the title of this discussion started a few days ago but didn't pay attention to them. this morning i came across to some of these message and rofl, too much drama. According to my experience, there are some risks of using hadoop. 1) not real time and mission critical, you may consider

Re: How to get hadoop job information effectively?

2011-09-21 Thread Robert Evans
Not that I know of. We scrape web pages which is a horrible thing to do. There is a JIRA to add in some web service APIs to expose this type of information, but it is not going to be available for a while. --Bobby Evans On 9/21/11 1:01 PM, "Benyi Wang" wrote: I'm working a project to collec

Re: risks of using Hadoop

2011-09-21 Thread Raj V
I have been following this thread. Over the last two years that I have been using hadoop with a fairly large cluster, my biggest problem has been analyzing failures. In the beginning it was fairly simple - unformatted name node, task   trackers not starting , heap allocation mistakes version id m

Re: Java programmatic authentication of Hadoop Kerberos

2011-09-21 Thread Sivva
Hi Lakshmi, Were you able to resolve the below issue. Even I'm facing the same issue, but couldn't resolve it. Please do reply me if you have the solution. Thanks in advance. Regards, Sivva. Sari1983 wrote: > > Hi, > > Kerberos has been configured for our Hadoop file system. I wish to do the >

Reducer hanging ( swapping? )

2011-09-21 Thread john smith
Hi Folks, I am running hive on a 10 node cluster. Since my hive queries have joins in them, their reduce phases are a bit heavy. I have 2GB RAM on each TT . The problem is that my reducer hangs at 76% for a large amount of time. I guess this is due to excessive swapping from disk to memory. My v

Re: Reducer hanging ( swapping? )

2011-09-21 Thread Raj V
2GB for a task tracker? Here are some possible thoughts. Compress  map output. Change  mapred.reduce.slowstart.completed.maps By the way I see no swapping.  Anything interesting from the task tracker log? System log? Raj > >From: john smith >To: common-user

Hadoop's use cases

2011-09-21 Thread Keren Ouaknine
Hello, I would like to collect Hadoop's compelling use cases. I am doing monitoring, measurements & benchmarking on Hadoop and would like to focus on its strong side. I have been working on less strong sides (small files, and the results compared to other systems with similar goals were not appeal

RE: risks of using Hadoop

2011-09-21 Thread Bill Habermaas
Amen to that. I haven't heard a good rant in a long time, I am definitely amused end entertained. As a veteran of 3 years with Hadoop I will say that the SPOF issue is whatever you want to make it. But it has not, nor will it ever defer me from using this great system. Every system has its ris

Can we replace namenode machine with some other machine ?

2011-09-21 Thread praveenesh kumar
Hi all, Can we replace our namenode machine later with some other machine. ? Actually I got a new server machine in my cluster and now I want to make this machine as my new namenode and jobtracker node ? Also Does Namenode/JobTracker machine's configuration needs to be better than datanodes/taskt

Re: Can we replace namenode machine with some other machine ?

2011-09-21 Thread Uma Maheswara Rao G 72686
You copy the same installations to new machine and change ip address. After that configure the new NN addresses to your clients and DNs. >Also Does Namenode/JobTracker machine's configuration needs to be better >than datanodes/tasktracker's ?? I did not get this question. Regards, Uma - Ori

Re: RE: risks of using Hadoop

2011-09-21 Thread Uma Maheswara Rao G 72686
Absolutely agree with you. Mainly we should consider SPOF and minimize the problem with our carefulness. (there are many ways to minimize this issue, we have seen in this thread) Regards, Uma - Original Message - From: Bill Habermaas Date: Thursday, September 22, 2011 10:04 am Subject: R

Re: Can we replace namenode machine with some other machine ?

2011-09-21 Thread praveenesh kumar
If I just change configuration settings in slave machines, Will it effect any of the data that is currently residing in the cluster ?? And my second question was... Do we need the master node (NN/JT hosting machine) to have good configuration than our slave machines(DN/TT hosting machines). Actua

Re: Can we replace namenode machine with some other machine ?

2011-09-21 Thread Uma Maheswara Rao G 72686
By just changing the configs will not effect your data. You need to restart your DNs to connect to new NN. For the second question: It will again depends on your usage. If your files will more in DFS then NN will consume more memory as it needs to store all the metadata info of the files in Na

Re: Can we replace namenode machine with some other machine ?

2011-09-21 Thread praveenesh kumar
But apart from storing metadata info, Is there anything more NN/JT machines are doing ?? . So I can say I can survive with poor NN if I am not dealing with lots of files in HDFS ? On Thu, Sep 22, 2011 at 11:08 AM, Uma Maheswara Rao G 72686 < mahesw...@huawei.com> wrote: > By just changing the con

Re: Java programmatic authentication of Hadoop Kerberos

2011-09-21 Thread Vinod Kumar Vavilapalli
You may be missing the kerberos principal for the namenode in your configuration used to connect to NameNode. Check your configuration for dfs.namenode.kerberos.principal and set it to the same value as on NN. HTH +Vinod On Thu, Sep 22, 2011 at 4:06 AM, Sivva wrote: > > Hi Lakshmi, > Were you a