hadoop 2.4.1
Balancing is very slow.
$HADOOP_PREFIX/bin/hdfs dfsadmin -setBalancerBandwidth 52428800
It takes long time to move the one block.
2014. 09. 11. 11:38:01 Block begins to move
2014-09-11 11:47:20 Complete block move
#10.2.1.211 netstat, Block begins to move, 10.2.1.210 -->>
Hi experts,
I faced one strange issue I cannot understand, can you guys tell me if this
is a bug or I configured something wrong. Below is my situation.
I'm running with Hadopp 2.2.0 release and all my jobs are uberized, each
node only can run a single job at a point of time, I used CapacitySched
I solved this in the end by using a shell script (initiated by an oozie
shell action) to use grep and loop through the results - didn't have to use
-v option, as the -e option gives you access to a fuller range of regular
expression functionality.
Thanks for your help (again!) Rich.
Charles
On 1
On 11 сент. 2014 г., at 0:47, Felix Chern wrote:
> If you don’t want anything get inserted, just set your output to key only or
> value only.
> TextOutputFormat$LineRecordWriter won’t insert anything unless both values
> are set:
If I output value only, for instance, and my line contains TAB
If you don’t want anything get inserted, just set your output to key only or
value only.
TextOutputFormat$LineRecordWriter won’t insert anything unless both values are
set:
public synchronized void write(K key, V value)
throws IOException {
boolean nullKey = key == null || key i
On 10 сент. 2014 г., at 22:33, Felix Chern wrote:
> Use ‘tr -s’ to stripe out tabs?
>
> $ echo -e "a\t\t\tb"
> a b
>
> $ echo -e "a\t\t\tb" | tr -s "\t"
> a b
>
There can be tabs in the input, I want to keep input lines without any
modification.
Actually it is rat
Use ‘tr -s’ to stripe out tabs?
$ echo -e "a\t\t\tb"
a b
$ echo -e "a\t\t\tb" | tr -s "\t"
a b
On Sep 10, 2014, at 11:28 AM, Dmitry Sivachenko wrote:
>
> On 10 сент. 2014 г., at 22:19, Rich Haase wrote:
>
>> You can write a custom output format
>
>
> Any clu
> 10 сент. 2014 г., в 22:47, Shahab Yunus написал(а):
>
> Examples (the top ones are related to streaming jobs):
>
> http://www.infoq.com/articles/HadoopOutputFormat
> http://research.neustar.biz/2011/08/30/custom-inputoutput-formats-in-hadoop-streaming/
> http://stackoverflow.com/questions/12
Examples (the top ones are related to streaming jobs):
http://www.infoq.com/articles/HadoopOutputFormat
http://research.neustar.biz/2011/08/30/custom-inputoutput-formats-in-hadoop-streaming/
http://stackoverflow.com/questions/12759651/how-to-override-inputformat-and-outputformat-in-hadoop-applicat
On 10 сент. 2014 г., at 22:19, Rich Haase wrote:
> You can write a custom output format
Any clues how can this can be done?
> , or you can write your mapreduce job in Java and use a NullWritable as
> Susheel recommended.
>
> grep (and every other *nix text processing command) I can thin
You can write a custom output format, or you can write your mapreduce job
in Java and use a NullWritable as Susheel recommended.
grep (and every other *nix text processing command) I can think of would
not be limited by a trailing tab character. It's even quite easy to strip
away that tab charact
On 10 сент. 2014 г., at 22:05, Rich Haase wrote:
> In python, or any streaming program just set the output value to the empty
> string and you will get something like "key"\t"".
>
I see, but I want to use many existing programs (like UNIX grep), and I don't
want to have and extra "\t" in th
In python, or any streaming program just set the output value to the empty
string and you will get something like "key"\t"".
On Wed, Sep 10, 2014 at 12:03 PM, Susheel Kumar Gadalay wrote:
> If you don't want key in the final output, you can set like this in Java.
>
> job.setOutputKeyClass(NullWr
If you don't want key in the final output, you can set like this in Java.
job.setOutputKeyClass(NullWritable.class);
It will just print the value in the output file.
I don't how to do it in python.
On 9/10/14, Dmitry Sivachenko wrote:
> Hello!
>
> Imagine the following common task: I want to p
You can set the number of reducers used in any hadoop job from the command
line by using -Dmapred.reduce.tasks=XX.
e.g. hadoop jar hadoop-mapreduce-examples.jar terasort
-Dmapred.reduce.tasks=10 /terasort-input /terasort-output
Hello!
Imagine the following common task: I want to process big text file line-by-line
using streaming interface.
Run unix grep command for instance. Or some other line-by-line processing,
e.g. line.upper().
I copy file to HDFS.
Then I run a map task on this file which reads one line, modifies
HDFS doesn't support he full range of glob matching you will find in Linux.
If you want to exclude all files from a directory listing that meet a
certain criteria try doing your listing and using grep -v to exclude the
matching records.
Hi,
I am trying the smoke test for Hadoop (2.4.1). About “terasort”, below is my
test command, the Map part was completed very fast because it was split into
many subtasks, however the Reduce part takes very long time and only 1 running
Reduce job. Is there a way speed up the reduce phase by
Hi,
In fact,
hdfs://latdevweb02:9000/home/hadoop/hadoop/input
is not a folder on hdfs.
I created a folder /tmp/hadoop-hadoop/dfs/data, where data will be saved in
hdfs.
And in my HADOOP_HOME folder, there is two folders “input” and “output”, but I
don’t know how to configure them in the progr
Hi,
Please that is my real problem.
Could you please look into my code in attached and tell me how I can update
this, please ?
How to set a job jar file?
And now, here is my hdfs-site.xml
==
-bash-4.1$ cat conf/hdfs-site.xml
dfs.replication
1
dfs.data.dir
Hello,
I am getting following error when running on 500MB dataset compressed in
avro data format.
Container [pid=22961,containerID=container_1409834588043_0080_01_10] is
running beyond virtual memory limits. Current usage: 636.6 MB of 1 GB
physical memory used; 2.1 GB of 2.1 GB virtual memory
Hi have you set a class in your code ?
>> WARN mapred.JobClient: No job jar file set. User classes may not be found.
>> See JobConf(Class) or JobConf#setJar(String).
>>
Also you need to check the path for your input file
>> Input path does not exist: hdfs://latdevweb02:9000/home/hadoop/hadoo
*hdfs://latdevweb02:9000/home/hadoop/hadoop/input*
is this is a valid path on hdfs? Can you access this path outside of the
program? For example using hadoop fs -ls command? Also, was this path and
files in it, created by a different user?
The exception seem to say that it does not exist or the r
Hello Hadoopers,
Here is the error, I'm facing when running WordCount example program written by
myself.
Kind find attached the file of my WordCount program.
Below the error.
=
That’s great.
Regards,
Yi Liu
From: Zesheng Wu [mailto:wuzeshen...@gmail.com]
Sent: Wednesday, September 10, 2014 8:25 PM
To: user@hadoop.apache.org
Subject: Re: HDFS: Couldn't obtain the locations of the last block
Hi Yi,
I went through HDFS-4516, and it really solves our problem, thanks very
Hi Georgi,
Thanks for your reply. Won't hadoop fs -ls /tmp/myfiles* return all files
that begin with 'myfiles' in the tmp directory? What I don't understand is
how I can specify a pattern that excludes files ending in '.tmp'. I have
tried using the normal regular expression syntax for this ^(.tmp)
Hi Yi,
I went through HDFS-4516, and it really solves our problem, thanks very
much!
2014-09-10 16:39 GMT+08:00 Zesheng Wu :
> Thanks Yi, I will look into HDFS-4516.
>
>
> 2014-09-10 15:03 GMT+08:00 Liu, Yi A :
>
> Hi Zesheng,
>>
>>
>>
>> I got from an offline email of you and knew your Hadoop
Yes you can :
hadoop fs -ls /tmp/myfiles*
I would recommend first using -ls in order to verify you are selecting
the right files.
#Mahesh : do you need some help doing this ?
On 10.09.2014 13:46, Mahesh Khandewal wrote:
I want to unsubscribe from this mailing list
On Wed, Sep 10, 2014 at
I want to unsubscribe from this mailing list
On Wed, Sep 10, 2014 at 4:42 PM, Charles Robertson <
charles.robert...@gmail.com> wrote:
> Hi all,
>
> Is it possible to use regular expressions in fs commands? Specifically, I
> want to use the copy (-cp) and move (-mv) commands on all files in a
> di
Hi all,
Is it possible to use regular expressions in fs commands? Specifically, I
want to use the copy (-cp) and move (-mv) commands on all files in a
directory that match a pattern (the pattern being all files that do not end
in '.tmp').
Can this be done?
Thanks,
Charles
Hello,
I developed a custom compression codec for Hadoop. Of course Hadoop is set to
use my codec when compressing data.
For testing purposes, I use the following two commands:
Compression test command:
---
hadoop jar
/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0
Thank you for your all support.
I could fix the issue this morning using this link, it was clearly explain.
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#java-io-ioexception-incompatible-namespaceids
You can use the link as well.
Warm regard
From: viv
> Incorrect configuration: namenode address dfs.namenode.servicerpc-address or
> dfs.namenode.rpc-address is not configured.
> Starting namenodes on []
NameNode/DataNode are part of a HDFS service. It makes no sense to try
and run them over an S3 URL default, which is a distributed filesystem
in
Hi Experts,
My hadoop cluster is enabled HA with QJM and I failed to upgrade it from
version 2.2.0 to 2.4.1. Why? Is this a existing issue?
My steps:
1. Stop hadoop cluster
2. On each node, upgrade hadoop binary with the newer version
3. On each JournalNode:
sbin/hadoop-daemon.sh start journalnod
Thanks Yi, I will look into HDFS-4516.
2014-09-10 15:03 GMT+08:00 Liu, Yi A :
> Hi Zesheng,
>
>
>
> I got from an offline email of you and knew your Hadoop version was
> 2.0.0-alpha and you also said “The block is allocated successfully in NN,
> but isn’t created in DN”.
>
> Yes, we may have th
Hi,
I have downloaded hadoop-2.5.0 and am trying to get it working for s3
backend *(single-node in a pseudo-distributed mode)*.
I have made changes to the core-site.xml according to
https://wiki.apache.org/hadoop/AmazonS3
I have an backend object store running on my machine that supports S3.
I g
Hi Zesheng,
I got from an offline email of you and knew your Hadoop version was 2.0.0-alpha
and you also said “The block is allocated successfully in NN, but isn’t created
in DN”.
Yes, we may have this issue in 2.0.0-alpha. I suspect your issue is similar
with HDFS-4516. And can you try Hadoo
37 matches
Mail list logo