Re: How to copy over using dfs

2011-05-27 Thread Harsh J
Mohit, On Sat, May 28, 2011 at 12:58 AM, Mohit Anchlia wrote: > If I have to overwrite a file I generally use > > hadoop dfs -rm > hadoop dfs -copyFromLocal or -put > > Is there a command to overwrite/replace the file instead of doing rm first? > There's no command available right now to do th

Re: web site doc link broken

2011-05-27 Thread Harsh J
Alright, I see its stable/ now. Weird, is my cache playing with me? On Sat, May 28, 2011 at 5:08 AM, Mark question wrote: > I also got the following from "learn about" : > Not Found > > The requested URL /common/docs/stable/ was not found on this server. > -- > Apache/

Re: web site doc link broken

2011-05-27 Thread Mark question
I also got the following from "learn about" : Not Found The requested URL /common/docs/stable/ was not found on this server. -- Apache/2.3.8 (Unix) mod_ssl/2.3.8 OpenSSL/1.0.0c Server at hadoop.apache.orgPort 80 Mark On Fri, May 27, 2011 at 8:03 AM, Harsh J wrote:

Re: How to copy over using dfs

2011-05-27 Thread Mark question
I don't think so, becauseI read somewhere that this is to insure the safety of the produced data. Hence Hadoop will force you to do this to know what exactly is happening. Mark On Fri, May 27, 2011 at 12:28 PM, Mohit Anchlia wrote: > If I have to overwrite a file I generally use > > hadoop dfs -

Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread Allen Wittenauer
On May 27, 2011, at 1:18 PM, Xu, Richard wrote: > Hi Allen, > > Thanks a lot for your response. > > I agree with you that it does not matter with replication settings. > > What really bothered me is same environment, same configures, hadoop 0.20.203 > takes us 3 mins, why 0.20.2 took 3 days.

Has anyone else seen out of memory errors at the start of combiner tasks?

2011-05-27 Thread W.P. McNeill
I have a job that uses an identity mapper and the same code for both the combiner and the reducer. In a small percentage of combiner tasks, after a few seconds I get errors that look like this: FATAL mapred.TaskTracker: Error running child : java.lang.OutOfMemoryError: Java heap space org.apache.

RE: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread Xu, Richard
Add more to that: I also tried start 0.20.2 on a linux machine in distributed mode, same error. I had successfully started 0.20.203 on this linux machine with same config. Seems that it is not related to Solaris. Could it caused by port? I checked a few, did not find anyone blocked. -Ori

RE: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread Xu, Richard
Hi Allen, Thanks a lot for your response. I agree with you that it does not matter with replication settings. What really bothered me is same environment, same configures, hadoop 0.20.203 takes us 3 mins, why 0.20.2 took 3 days. Can you pls. shed more light on how "to make Hadoop's broken user

How to copy over using dfs

2011-05-27 Thread Mohit Anchlia
If I have to overwrite a file I generally use hadoop dfs -rm hadoop dfs -copyFromLocal or -put Is there a command to overwrite/replace the file instead of doing rm first?

Re: Using own InputSplit

2011-05-27 Thread Harsh J
Mohit, On Fri, May 27, 2011 at 10:44 PM, Mohit Anchlia wrote: > Actually this link confused me > > http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Job+Input > > "Clearly, logical splits based on input-size is insufficient for many > applications since record boundaries must be r

Re: Using own InputSplit

2011-05-27 Thread Mohit Anchlia
Actually this link confused me http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Job+Input "Clearly, logical splits based on input-size is insufficient for many applications since record boundaries must be respected. In such cases, the application should implement a RecordReader,

Re: Using own InputSplit

2011-05-27 Thread Harsh J
The query fit into mapreduce-user, since it primarily dealt with how Map/Reduce operates over data, just to clarify :) On Fri, May 27, 2011 at 10:38 PM, Mohit Anchlia wrote: > thanks! Just thought it's better to post to multiple groups together > since I didn't know where it belongs :) > > On Fri

Re: Using own InputSplit

2011-05-27 Thread Mohit Anchlia
thanks! Just thought it's better to post to multiple groups together since I didn't know where it belongs :) On Fri, May 27, 2011 at 10:04 AM, Harsh J wrote: > Mohit, > > Please do not cross-post a question to multiple lists unless you're > announcing something. > > What you describe, does not ha

Re: Using own InputSplit

2011-05-27 Thread Harsh J
Mohit, Please do not cross-post a question to multiple lists unless you're announcing something. What you describe, does not happen; and the way the splitting is done for Text files is explained in good detail here: http://wiki.apache.org/hadoop/HadoopMapReduce Hope this solves your doubt :) On

Using own InputSplit

2011-05-27 Thread Mohit Anchlia
I am new to hadoop and from what I understand by default hadoop splits the input into blocks. Now this might result in splitting a line of record into 2 pieces and getting spread accross 2 maps. For eg: Line "abcd" might get split into "ab" and "cd". How can one prevent this in hadoop and pig? I am

Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread Allen Wittenauer
On May 27, 2011, at 7:26 AM, DAN wrote: > You see you have "2 Solaris servers for now", and dfs.replication is setted > as 3. > These don't match. That doesn't matter. HDFS will basically flag any files written with a warning that they are under-replicated. The problem is tha

Re: web site doc link broken

2011-05-27 Thread Harsh J
Am not sure if someone's already fixed this, but I head to the first link and click Learn About, and it gets redirected to the current/ just fine. There's only one such link on the page as well. On Fri, May 27, 2011 at 3:42 AM, Lee Fisher wrote: > Th Hadoop Common home page: > http://hadoop.apach

Re: Increase node-mappers capacity in single node

2011-05-27 Thread Harsh J
Hello Mark, This is due to a default configuration (tasktracker slots, as we generally call it) and is covered in the FAQ: http://wiki.apache.org/hadoop/FAQ#I_see_a_maximum_of_2_maps.2BAC8-reduces_spawned_concurrently_on_each_TaskTracker.2C_how_do_I_increase_that.3F On Fri, May 27, 2011 at 11:56

Re:RE: Re:Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread DAN
Hi, Richard You see you have "2 Solaris servers for now", and dfs.replication is setted as 3. These don't match. Good Luck Dan At 2011-05-27 19:34:10,"Xu, Richard " wrote: >That setting is 3. > >From: DAN [mailto:chaidong...@163.com] >Sent: Thursday, May 26, 2011 10:23 PM >To: common-user@ha

Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread Harsh J
Hello RX, Could you paste your DFS configuration and the DN end-to-end log into a mail/pastebin-link? On Fri, May 27, 2011 at 5:31 AM, Xu, Richard wrote: > Hi Folks, > > We try to get hbase and hadoop running on clusters, take 2 Solaris servers > for now. > > Because of the incompatibility issu

Error while trying to connect use s3 with Haddop in pseudo mode

2011-05-27 Thread Subhramanian, Deepak
I am trying to use Amazon s3 with Hadoop pseudo mode. I am getting some errors in the log for datanode , namenode , jobtracker etc. I did hadoop namenode -format before starting the hadoop services. Please help. I am able to use the hadoop and list the directories in my s3 bucket. I am using Cloude

Re: Re:Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread Simon
First you need to make sure that your dfs daemons are running. You can start you namenode and datanode separately on the master and slave nodes, and see what happens with the following commands: hadoop namenode hadoop datanode The chancess are that your data node can not be started correctly. Let

RE: Re:Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread Xu, Richard
That setting is 3. From: DAN [mailto:chaidong...@163.com] Sent: Thursday, May 26, 2011 10:23 PM To: common-user@hadoop.apache.org; Xu, Richard [ICG-IT] Subject: Re:Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster Hi, Richard Pay attention to "Not able to place enough repl

Re: java.lang.NoClassDefFoundError: com.sun.security.auth.UnixPrincipal

2011-05-27 Thread Steve Loughran
On 05/26/2011 07:45 PM, subhransu wrote: Hello Geeks, I am a new bee to use hadoop and i am currently installed hadoop-0.20.203.0 I am running the sample programs part of this package but getting this error Any pointer to fix this ??? ~/Hadoop/hadoop-0.20.203.0 788> bin/hadoop jar hadoop-exa

Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread Konstantin Boudnik
On Thu, May 26, 2011 at 07:01PM, Xu, Richard wrote: > 2011-05-26 12:30:29,175 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 4 on 9000, call addBlock(/tmp/hadoop-cfadm/mapred/system/jobtracker.info, > DFSCl > ient_2146408809) from 169.193.181.212:55334: error: java.io.IOException: File

Re: Can not access hadoop cluster from outside

2011-05-27 Thread Harsh J
What is your ${fs.default.name} set to? On Fri, May 27, 2011 at 12:29 PM, Jeff Zhang wrote: > Hi all, > > I meet a wried problem that I can not access hadoop cluster from outside. I > have a client machine, and I can telnet namenode's port 9000 in this client > machine , but I can not access the