I use the CDH-4.3.1 and mr1, when I run one job, I am getting the following
error.
Exception in thread main java.lang.IncompatibleClassChangeError:
Found interface org.apache.hadoop.mapreduce.JobContext, but class was
expected
at
Hi,
I have a couple of questions about the process of uploading a large file (
10GB) to HDFS.
To make sure my understanding is correct, assuming I have a cluster of N
machines.
- What happens in the following:
Case 1:
assuming i want to uppload a file (input.txt) of size K
Its not the namenode that does the reading or breaking of the file..
When you run the command hadoop fs -put input output.
Here hadoop is a script file which is default client for hadoop..and
when the client contacts the namenode for writing, then NN creates a
block id and ask 3 dN to host the
Hi
Here is the input file for the wordcount job:
**
Hi This is a simple test.
Hi Hadoop how r u.
Hello Hello.
Hi Hi.
Hadoop Hadoop Welcome.
**
After running the wordcount successfully
here r the counters info:
***
Job Counters SLOTS_MILLIS_MAPS 0 0
Hi,
Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
avoid this.
Regards,
Viji
On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai saigr...@yahoo.in wrote:
Hi
Here is the input file for the wordcount job:
**
Hi This is a simple test.
Hi Hadoop how r u.
Hello
Number of map tasks on a mapreduce job doesnt depend on this
property..it depends on the number of input splits...( or equal to
number blocks if input split size = block size)
1. What is the input format you are using? if yes what is the value of
N, you are using?
2. WHat is the propety
Case 2:
While selecting target DN in case of write operations, NN will always
prefers first DN as same DN from where client sending the data, in some
cases NN ignore that DN when there is some disk space issues or some other
health symptoms found,rest of things will same.
Thanks
Jitendra
On
Thanks Viji.
I am confused a little when the data is small y would there b 2 tasks.
U will use the min as 2 if u need it but in this case it is not needed due to
size of the data being small
so y would 2 map tasks exec.
Since it results in 1 block with 5 lines of data in it
i am assuming this
Hi Sai,
What Viji indicated is that the default Apache Hadoop setting for any
input is 2 maps. If the input is larger than one block, regular
policies of splitting such as those stated by Shekhar would apply. But
for smaller inputs, just for an out-of-box parallelism experience,
Hadoop ships with
Please also share your exact command attempt and the full error you received.
On Wed, Sep 25, 2013 at 4:05 PM, Manickam P manicka...@outlook.com wrote:
Hi,
In my federated cluster setup when i try to create a directory i'm getting
no such file or directory. here i've given my core site xml
Hi Tom,
The edits are processed sequentially, and aren't all held in memory.
Right now there's no mid-way-checkpoint when it is loaded, such that
it could resume only with remaining work if interrupted. Normally this
is not a problem in deployments given that SNN or SBN runs for
checkpointing the
Just try to do mannual checkpointing..
*Thanks Regards*
∞
Shashwat Shriparv
On Thu, Sep 26, 2013 at 5:35 PM, Harsh J ha...@cloudera.com wrote:
Hi Tom,
The edits are processed sequentially, and aren't all held in memory.
Right now there's no mid-way-checkpoint when it is loaded,
just try giving -Dmapred.tasktracker.map.tasks.maximum=1 on the command
line and check how many map task its running. and also set this in
mapred-site.xml and check.
*Thanks Regards*
∞
Shashwat Shriparv
On Thu, Sep 26, 2013 at 5:24 PM, Harsh J ha...@cloudera.com wrote:
Hi Sai,
What
Thanks for the reply. when the client caches 64KB of data on its own side,
do you know which set of major java classes/files responsible for such
action?
--
Best Regards,
Karim Ahmed Awara
On Thu, Sep 26, 2013 at 2:25 PM, Jitendra Yadav
jeetuyadav200...@gmail.comwrote:
Case 2:
While
Hi guys,
I had a small cluster including 1 namenode and 2 datanodes
I got webhdfs enabled
and I sent an OPEN request as follow:
curl -i -L 'http://namenode:50070/webhdfs/v1/path/to/somefile?op=OPEN'
It redirected to
http://datanode:50075/webhdfs/v1/path/to/somefile?op=OPEN
can you share your /etc/hosts file ?
On Thu, Sep 26, 2013 at 7:54 PM, douxin douxins...@gmail.com wrote:
Hi guys,
I had a small cluster including 1 namenode and 2 datanodes
I got webhdfs enabled
and I sent an OPEN request as follow:
curl -i -L
Guys,
I have done a federated cluster setup. when i try to move a file to HDFS using
copyFromLocal it says the below. copyFromLocal: Renames across FileSystems
not supportedI used the below command to move the file. ./hdfs dfs
-copyFromLocal /home/1gb-junk /home/storage/mount1
Am i doing
Hi
I would like to wrap DFSInputStream by extension. However it seems that the
DFSInputStream constructor is package private. Is there anyway to achieve
my goal? Also just out of curiosity why you have made this class
inaccessible for developers, or am I missing something?
regards
tmp
Dmapred.tasktracker.map.tasks.maximum=1 ...Guys this property is set
for task tracker...when you set this property it means, that
particular task tracker will not run more than 1 mapper task
parallely..
FOr example: if a map reduce job requires 5 mapper tasks and if you
set this property to 1,
This is actually somewhat common in some of the hadoop core classes :
Private constructors and inner classes. I think in the long term jiras
should be opened for these to make them public and pluggable with public
parameterized constructors wherever possible, so that modularizations can
be
The way we have gotten around this in the past is extending and then
copying the private code and creating a brand new implementation.
On Thu, Sep 26, 2013 at 10:50 AM, Jay Vyas jayunit...@gmail.com wrote:
This is actually somewhat common in some of the hadoop core classes :
Private
Hi Pawar,
can you share your /etc/hosts file ?
sure,
as below:
namenode (Fedora-19192.168.1.217)
===
#cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4
localhost4.localdomain4
::1 localhost localhost.localdomain
Just curious, any reason you don't want to use the DFSDataInputStream?
Yong
Date: Thu, 26 Sep 2013 16:46:00 +0200
Subject: Extending DFSInputStream class
From: tmp5...@gmail.com
To: user@hadoop.apache.org
Hi
I would like to wrap DFSInputStream by extension. However it seems that the
I have specific complex stream scheme, which I want to hide from the user
(short answer), also some security reasons (limiting possible read buffer
size).
2013/9/26 java8964 java8964 java8...@hotmail.com
Just curious, any reason you don't want to use the DFSDataInputStream?
Yong
Try to put full hdfs address and see
On Thu, Sep 26, 2013 at 8:05 PM, Manickam P manicka...@outlook.com wrote:
/home/1gb-junk
*Thanks Regards*
∞
Shashwat Shriparv
Hi,
I tried that also. The command i used - ./hdfs dfs -copyFromLocal
/home/1gb-junk hdfs://10.108.99.68:8020/home/storage/mount1It says
copyFromLocal: `hdfs://10.108.99.68:8020/home/storage/mount1': No such file or
directory
Thanks,
Manickam P
From: dwivedishash...@gmail.com
Date: Thu, 26
Shashwat,
When you say manual checkpointing, are you talking about a specific
process? Or are you just recommending that I run the secondarynamenode to
prevent this situation in the first place?
--Tom
On Thu, Sep 26, 2013 at 6:29 AM, shashwat shriparv
dwivedishash...@gmail.com wrote:
Just
It ran again for about 15 hours before dying again. I'm seeing what extra
RAM resources we can throw at this VM (maybe up to 32GB), but until then
I'm trying to figure out if I'm hitting some strange bug.
When the edits were originally made (over the course of 6 weeks), the
namenode only had
Can you share how many blocks does your cluster have? how many directories?
how many files?
There is a JIRA https://issues.apache.org/jira/browse/HADOOP-1687 which
explains how much RAM will be used for your namenode.
Its pretty old by hadoop version but its a good starting point.
According to
A simple estimate puts the total number of blocks somewhere around 500,000.
Due to an HBase bug (HBASE-9648), there were approximately 50,000,000 files
that were created and quickly deleted (about 10/sec for 6 weeks) in the
cluster, and that activity is what is contained in the edits.
Since those
Tom,
That is valuable info. When we replay edits, we would be creating
and then deleting those files - so memory would grow in between until
the delete events begin appearing in the edit log segment.
On Thu, Sep 26, 2013 at 10:07 PM, Tom Brown tombrow...@gmail.com wrote:
A simple estimate puts
They were created and deleted in quick succession. I thought that meant the
edits for both the create and delete would be logically next to each other
in the file allowing it to release the memory almost as soon as it had been
allocated.
In any case, after finding a VM host that could give me
i use flume,but the is write,when it finished ,it can be see
On Wed, Sep 25, 2013 at 11:08 PM, Peyman Mohajerian mohaj...@gmail.comwrote:
In my experience with Flume and this issue, it occurs when the file is not
properly closed. If it was then it would show you the correct size and Hive
will
Hi
I have a few questions i am trying to understand:
1. Is each input split same as a record, (a rec can be a single line or
multiple lines).
2. Is each Task a collection of few computations or attempts.
For ex: if i have a small file with 5 lines.
By default there will be 1 line on which
34 matches
Mail list logo