IncompatibleClassChangeError

2013-09-26 Thread lei liu
I use the CDH-4.3.1 and mr1, when I run one job, I am getting the following error. Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at

Uploading a file to HDFS

2013-09-26 Thread Karim Awara
Hi, I have a couple of questions about the process of uploading a large file ( 10GB) to HDFS. To make sure my understanding is correct, assuming I have a cluster of N machines. - What happens in the following: Case 1: assuming i want to uppload a file (input.txt) of size K

Re: Uploading a file to HDFS

2013-09-26 Thread Shekhar Sharma
Its not the namenode that does the reading or breaking of the file.. When you run the command hadoop fs -put input output. Here hadoop is a script file which is default client for hadoop..and when the client contacts the namenode for writing, then NN creates a block id and ask 3 dN to host the

Re: 2 Map tasks running for a small input file

2013-09-26 Thread Sai Sai
Hi Here is the input file for the wordcount job: ** Hi This is a simple test. Hi Hadoop how r u. Hello Hello. Hi Hi. Hadoop Hadoop Welcome. ** After running the wordcount successfully  here r the counters info: *** Job Counters SLOTS_MILLIS_MAPS 0 0

Re: 2 Map tasks running for a small input file

2013-09-26 Thread Viji R
Hi, Default number of map tasks is 2. You can set mapred.map.tasks to 1 to avoid this. Regards, Viji On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai saigr...@yahoo.in wrote: Hi Here is the input file for the wordcount job: ** Hi This is a simple test. Hi Hadoop how r u. Hello

Re: 2 Map tasks running for a small input file

2013-09-26 Thread Shekhar Sharma
Number of map tasks on a mapreduce job doesnt depend on this property..it depends on the number of input splits...( or equal to number blocks if input split size = block size) 1. What is the input format you are using? if yes what is the value of N, you are using? 2. WHat is the propety

Re: Uploading a file to HDFS

2013-09-26 Thread Jitendra Yadav
Case 2: While selecting target DN in case of write operations, NN will always prefers first DN as same DN from where client sending the data, in some cases NN ignore that DN when there is some disk space issues or some other health symptoms found,rest of things will same. Thanks Jitendra On

Re: 2 Map tasks running for a small input file

2013-09-26 Thread Sai Sai
Thanks Viji. I am confused a little when the data is small y would there b 2 tasks. U will use the min as 2 if u need it but in this case it is not needed due to size of the data being small  so y would 2 map tasks exec. Since it results in 1 block with 5 lines of data in it i am assuming this

Re: 2 Map tasks running for a small input file

2013-09-26 Thread Harsh J
Hi Sai, What Viji indicated is that the default Apache Hadoop setting for any input is 2 maps. If the input is larger than one block, regular policies of splitting such as those stated by Shekhar would apply. But for smaller inputs, just for an out-of-box parallelism experience, Hadoop ships with

Re: Unable to create a directory in federated hdfs.

2013-09-26 Thread Harsh J
Please also share your exact command attempt and the full error you received. On Wed, Sep 25, 2013 at 4:05 PM, Manickam P manicka...@outlook.com wrote: Hi, In my federated cluster setup when i try to create a directory i'm getting no such file or directory. here i've given my core site xml

Re: Is there any way to partially process HDFS edits?

2013-09-26 Thread Harsh J
Hi Tom, The edits are processed sequentially, and aren't all held in memory. Right now there's no mid-way-checkpoint when it is loaded, such that it could resume only with remaining work if interrupted. Normally this is not a problem in deployments given that SNN or SBN runs for checkpointing the

Re: Is there any way to partially process HDFS edits?

2013-09-26 Thread shashwat shriparv
Just try to do mannual checkpointing.. *Thanks Regards* ∞ Shashwat Shriparv On Thu, Sep 26, 2013 at 5:35 PM, Harsh J ha...@cloudera.com wrote: Hi Tom, The edits are processed sequentially, and aren't all held in memory. Right now there's no mid-way-checkpoint when it is loaded,

Re: 2 Map tasks running for a small input file

2013-09-26 Thread shashwat shriparv
just try giving -Dmapred.tasktracker.map.tasks.maximum=1 on the command line and check how many map task its running. and also set this in mapred-site.xml and check. *Thanks Regards* ∞ Shashwat Shriparv On Thu, Sep 26, 2013 at 5:24 PM, Harsh J ha...@cloudera.com wrote: Hi Sai, What

Re: Uploading a file to HDFS

2013-09-26 Thread Karim Awara
Thanks for the reply. when the client caches 64KB of data on its own side, do you know which set of major java classes/files responsible for such action? -- Best Regards, Karim Ahmed Awara On Thu, Sep 26, 2013 at 2:25 PM, Jitendra Yadav jeetuyadav200...@gmail.comwrote: Case 2: While

please HELP me with a WebHDFS problem

2013-09-26 Thread douxin
Hi guys, I had a small cluster including 1 namenode and 2 datanodes I got webhdfs enabled and I sent an OPEN request as follow: curl -i -L 'http://namenode:50070/webhdfs/v1/path/to/somefile?op=OPEN' It redirected to http://datanode:50075/webhdfs/v1/path/to/somefile?op=OPEN

Re: please HELP me with a WebHDFS problem

2013-09-26 Thread Nitin Pawar
can you share your /etc/hosts file ? On Thu, Sep 26, 2013 at 7:54 PM, douxin douxins...@gmail.com wrote: Hi guys, I had a small cluster including 1 namenode and 2 datanodes I got webhdfs enabled and I sent an OPEN request as follow: curl -i -L

Moving a file to HDFS

2013-09-26 Thread Manickam P
Guys, I have done a federated cluster setup. when i try to move a file to HDFS using copyFromLocal it says the below. copyFromLocal: Renames across FileSystems not supportedI used the below command to move the file. ./hdfs dfs -copyFromLocal /home/1gb-junk /home/storage/mount1 Am i doing

Extending DFSInputStream class

2013-09-26 Thread Rob Blah
Hi I would like to wrap DFSInputStream by extension. However it seems that the DFSInputStream constructor is package private. Is there anyway to achieve my goal? Also just out of curiosity why you have made this class inaccessible for developers, or am I missing something? regards tmp

Re: 2 Map tasks running for a small input file

2013-09-26 Thread Shekhar Sharma
Dmapred.tasktracker.map.tasks.maximum=1 ...Guys this property is set for task tracker...when you set this property it means, that particular task tracker will not run more than 1 mapper task parallely.. FOr example: if a map reduce job requires 5 mapper tasks and if you set this property to 1,

Re: Extending DFSInputStream class

2013-09-26 Thread Jay Vyas
This is actually somewhat common in some of the hadoop core classes : Private constructors and inner classes. I think in the long term jiras should be opened for these to make them public and pluggable with public parameterized constructors wherever possible, so that modularizations can be

Re: Extending DFSInputStream class

2013-09-26 Thread Jay Vyas
The way we have gotten around this in the past is extending and then copying the private code and creating a brand new implementation. On Thu, Sep 26, 2013 at 10:50 AM, Jay Vyas jayunit...@gmail.com wrote: This is actually somewhat common in some of the hadoop core classes : Private

Re: please HELP me with a WebHDFS problem

2013-09-26 Thread douxin
Hi Pawar, can you share your /etc/hosts file ? sure, as below: namenode (Fedora-19192.168.1.217) === #cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain

RE: Extending DFSInputStream class

2013-09-26 Thread java8964 java8964
Just curious, any reason you don't want to use the DFSDataInputStream? Yong Date: Thu, 26 Sep 2013 16:46:00 +0200 Subject: Extending DFSInputStream class From: tmp5...@gmail.com To: user@hadoop.apache.org Hi I would like to wrap DFSInputStream by extension. However it seems that the

Re: Extending DFSInputStream class

2013-09-26 Thread Rob Blah
I have specific complex stream scheme, which I want to hide from the user (short answer), also some security reasons (limiting possible read buffer size). 2013/9/26 java8964 java8964 java8...@hotmail.com Just curious, any reason you don't want to use the DFSDataInputStream? Yong

Re: Moving a file to HDFS

2013-09-26 Thread shashwat shriparv
Try to put full hdfs address and see On Thu, Sep 26, 2013 at 8:05 PM, Manickam P manicka...@outlook.com wrote: /home/1gb-junk *Thanks Regards* ∞ Shashwat Shriparv

RE: Moving a file to HDFS

2013-09-26 Thread Manickam P
Hi, I tried that also. The command i used - ./hdfs dfs -copyFromLocal /home/1gb-junk hdfs://10.108.99.68:8020/home/storage/mount1It says copyFromLocal: `hdfs://10.108.99.68:8020/home/storage/mount1': No such file or directory Thanks, Manickam P From: dwivedishash...@gmail.com Date: Thu, 26

Re: Is there any way to partially process HDFS edits?

2013-09-26 Thread Tom Brown
Shashwat, When you say manual checkpointing, are you talking about a specific process? Or are you just recommending that I run the secondarynamenode to prevent this situation in the first place? --Tom On Thu, Sep 26, 2013 at 6:29 AM, shashwat shriparv dwivedishash...@gmail.com wrote: Just

Re: Is there any way to partially process HDFS edits?

2013-09-26 Thread Tom Brown
It ran again for about 15 hours before dying again. I'm seeing what extra RAM resources we can throw at this VM (maybe up to 32GB), but until then I'm trying to figure out if I'm hitting some strange bug. When the edits were originally made (over the course of 6 weeks), the namenode only had

Re: Is there any way to partially process HDFS edits?

2013-09-26 Thread Nitin Pawar
Can you share how many blocks does your cluster have? how many directories? how many files? There is a JIRA https://issues.apache.org/jira/browse/HADOOP-1687 which explains how much RAM will be used for your namenode. Its pretty old by hadoop version but its a good starting point. According to

Re: Is there any way to partially process HDFS edits?

2013-09-26 Thread Tom Brown
A simple estimate puts the total number of blocks somewhere around 500,000. Due to an HBase bug (HBASE-9648), there were approximately 50,000,000 files that were created and quickly deleted (about 10/sec for 6 weeks) in the cluster, and that activity is what is contained in the edits. Since those

Re: Is there any way to partially process HDFS edits?

2013-09-26 Thread Harsh J
Tom, That is valuable info. When we replay edits, we would be creating and then deleting those files - so memory would grow in between until the delete events begin appearing in the edit log segment. On Thu, Sep 26, 2013 at 10:07 PM, Tom Brown tombrow...@gmail.com wrote: A simple estimate puts

Re: Is there any way to partially process HDFS edits?

2013-09-26 Thread Tom Brown
They were created and deleted in quick succession. I thought that meant the edits for both the create and delete would be logically next to each other in the file allowing it to release the memory almost as soon as it had been allocated. In any case, after finding a VM host that could give me

Re: issue about invisible data in haoop file

2013-09-26 Thread ch huang
i use flume,but the is write,when it finished ,it can be see On Wed, Sep 25, 2013 at 11:08 PM, Peyman Mohajerian mohaj...@gmail.comwrote: In my experience with Flume and this issue, it occurs when the file is not properly closed. If it was then it would show you the correct size and Hive will

Re: Input Split vs Task vs attempt vs computation

2013-09-26 Thread Sai Sai
Hi I have a few questions i am trying to understand: 1. Is each input split same as a record, (a rec can be a single line or multiple lines). 2. Is each Task a collection of few computations or attempts. For ex: if i have a small file with 5 lines. By default there will be 1 line on which