Balancing is very slow.

2014-09-10 Thread cho ju il
hadoop 2.4.1 Balancing is very slow. $HADOOP_PREFIX/bin/hdfs dfsadmin -setBalancerBandwidth 52428800 It takes long time to move the one block. 2014. 09. 11. 11:38:01 Block begins to move 2014-09-11 11:47:20 Complete block move #10.2.1.211 netstat, Block begins to move, 10.2.1.210 -->>

The running job is blocked for a while if the queue is short of resources

2014-09-10 Thread Anfernee Xu
Hi experts, I faced one strange issue I cannot understand, can you guys tell me if this is a bug or I configured something wrong. Below is my situation. I'm running with Hadopp 2.2.0 release and all my jobs are uberized, each node only can run a single job at a point of time, I used CapacitySched

Re: Regular expressions in fs paths?

2014-09-10 Thread Charles Robertson
I solved this in the end by using a shell script (initiated by an oozie shell action) to use grep and loop through the results - didn't have to use -v option, as the -e option gives you access to a fuller range of regular expression functionality. Thanks for your help (again!) Rich. Charles On 1

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Dmitry Sivachenko
On 11 сент. 2014 г., at 0:47, Felix Chern wrote: > If you don’t want anything get inserted, just set your output to key only or > value only. > TextOutputFormat$LineRecordWriter won’t insert anything unless both values > are set: If I output value only, for instance, and my line contains TAB

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Felix Chern
If you don’t want anything get inserted, just set your output to key only or value only. TextOutputFormat$LineRecordWriter won’t insert anything unless both values are set: public synchronized void write(K key, V value) throws IOException { boolean nullKey = key == null || key i

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Dmitry Sivachenko
On 10 сент. 2014 г., at 22:33, Felix Chern wrote: > Use ‘tr -s’ to stripe out tabs? > > $ echo -e "a\t\t\tb" > a b > > $ echo -e "a\t\t\tb" | tr -s "\t" > a b > There can be tabs in the input, I want to keep input lines without any modification. Actually it is rat

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Felix Chern
Use ‘tr -s’ to stripe out tabs? $ echo -e "a\t\t\tb" a b $ echo -e "a\t\t\tb" | tr -s "\t" a b On Sep 10, 2014, at 11:28 AM, Dmitry Sivachenko wrote: > > On 10 сент. 2014 г., at 22:19, Rich Haase wrote: > >> You can write a custom output format > > > Any clu

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Dmitry Sivachenko
> 10 сент. 2014 г., в 22:47, Shahab Yunus написал(а): > > Examples (the top ones are related to streaming jobs): > > http://www.infoq.com/articles/HadoopOutputFormat > http://research.neustar.biz/2011/08/30/custom-inputoutput-formats-in-hadoop-streaming/ > http://stackoverflow.com/questions/12

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Shahab Yunus
Examples (the top ones are related to streaming jobs): http://www.infoq.com/articles/HadoopOutputFormat http://research.neustar.biz/2011/08/30/custom-inputoutput-formats-in-hadoop-streaming/ http://stackoverflow.com/questions/12759651/how-to-override-inputformat-and-outputformat-in-hadoop-applicat

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Dmitry Sivachenko
On 10 сент. 2014 г., at 22:19, Rich Haase wrote: > You can write a custom output format Any clues how can this can be done? > , or you can write your mapreduce job in Java and use a NullWritable as > Susheel recommended. > > grep (and every other *nix text processing command) I can thin

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Rich Haase
You can write a custom output format, or you can write your mapreduce job in Java and use a NullWritable as Susheel recommended. grep (and every other *nix text processing command) I can think of would not be limited by a trailing tab character. It's even quite easy to strip away that tab charact

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Dmitry Sivachenko
On 10 сент. 2014 г., at 22:05, Rich Haase wrote: > In python, or any streaming program just set the output value to the empty > string and you will get something like "key"\t"". > I see, but I want to use many existing programs (like UNIX grep), and I don't want to have and extra "\t" in th

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Rich Haase
In python, or any streaming program just set the output value to the empty string and you will get something like "key"\t"". On Wed, Sep 10, 2014 at 12:03 PM, Susheel Kumar Gadalay wrote: > If you don't want key in the final output, you can set like this in Java. > > job.setOutputKeyClass(NullWr

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Susheel Kumar Gadalay
If you don't want key in the final output, you can set like this in Java. job.setOutputKeyClass(NullWritable.class); It will just print the value in the output file. I don't how to do it in python. On 9/10/14, Dmitry Sivachenko wrote: > Hello! > > Imagine the following common task: I want to p

Re: Hadoop Smoke Test: TERASORT

2014-09-10 Thread Rich Haase
You can set the number of reducers used in any hadoop job from the command line by using -Dmapred.reduce.tasks=XX. e.g. hadoop jar hadoop-mapreduce-examples.jar terasort -Dmapred.reduce.tasks=10 /terasort-input /terasort-output

Writing output from streaming task without dealing with key/value

2014-09-10 Thread Dmitry Sivachenko
Hello! Imagine the following common task: I want to process big text file line-by-line using streaming interface. Run unix grep command for instance. Or some other line-by-line processing, e.g. line.upper(). I copy file to HDFS. Then I run a map task on this file which reads one line, modifies

Re: Regular expressions in fs paths?

2014-09-10 Thread Rich Haase
HDFS doesn't support he full range of glob matching you will find in Linux. If you want to exclude all files from a directory listing that meet a certain criteria try doing your listing and using grep -v to exclude the matching records.

Hadoop Smoke Test: TERASORT

2014-09-10 Thread arthur.hk.c...@gmail.com
Hi, I am trying the smoke test for Hadoop (2.4.1). About “terasort”, below is my test command, the Map part was completed very fast because it was split into many subtasks, however the Reduce part takes very long time and only 1 running Reduce job. Is there a way speed up the reduce phase by

RE: Error when executing a WordCount Program

2014-09-10 Thread YIMEN YIMGA Gael
Hi, In fact, hdfs://latdevweb02:9000/home/hadoop/hadoop/input is not a folder on hdfs. I created a folder /tmp/hadoop-hadoop/dfs/data, where data will be saved in hdfs. And in my HADOOP_HOME folder, there is two folders “input” and “output”, but I don’t know how to configure them in the progr

RE: Error when executing a WordCount Program

2014-09-10 Thread YIMEN YIMGA Gael
Hi, Please that is my real problem. Could you please look into my code in attached and tell me how I can update this, please ? How to set a job jar file? And now, here is my hdfs-site.xml == -bash-4.1$ cat conf/hdfs-site.xml dfs.replication 1 dfs.data.dir

running beyond virtual memory limits

2014-09-10 Thread Jakub Stransky
Hello, I am getting following error when running on 500MB dataset compressed in avro data format. Container [pid=22961,containerID=container_1409834588043_0080_01_10] is running beyond virtual memory limits. Current usage: 636.6 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory

Re: Error when executing a WordCount Program

2014-09-10 Thread Chris MacKenzie
Hi have you set a class in your code ? >> WARN mapred.JobClient: No job jar file set. User classes may not be found. >> See JobConf(Class) or JobConf#setJar(String). >> Also you need to check the path for your input file >> Input path does not exist: hdfs://latdevweb02:9000/home/hadoop/hadoo

Re: Error when executing a WordCount Program

2014-09-10 Thread Shahab Yunus
*hdfs://latdevweb02:9000/home/hadoop/hadoop/input* is this is a valid path on hdfs? Can you access this path outside of the program? For example using hadoop fs -ls command? Also, was this path and files in it, created by a different user? The exception seem to say that it does not exist or the r

Error when executing a WordCount Program

2014-09-10 Thread YIMEN YIMGA Gael
Hello Hadoopers, Here is the error, I'm facing when running WordCount example program written by myself. Kind find attached the file of my WordCount program. Below the error. =

RE: HDFS: Couldn't obtain the locations of the last block

2014-09-10 Thread Liu, Yi A
That’s great. Regards, Yi Liu From: Zesheng Wu [mailto:wuzeshen...@gmail.com] Sent: Wednesday, September 10, 2014 8:25 PM To: user@hadoop.apache.org Subject: Re: HDFS: Couldn't obtain the locations of the last block Hi Yi, I went through HDFS-4516, and it really solves our problem, thanks very

Re: Regular expressions in fs paths?

2014-09-10 Thread Charles Robertson
Hi Georgi, Thanks for your reply. Won't hadoop fs -ls /tmp/myfiles* return all files that begin with 'myfiles' in the tmp directory? What I don't understand is how I can specify a pattern that excludes files ending in '.tmp'. I have tried using the normal regular expression syntax for this ^(.tmp)

Re: HDFS: Couldn't obtain the locations of the last block

2014-09-10 Thread Zesheng Wu
Hi Yi, I went through HDFS-4516, and it really solves our problem, thanks very much! 2014-09-10 16:39 GMT+08:00 Zesheng Wu : > Thanks Yi, I will look into HDFS-4516. > > > 2014-09-10 15:03 GMT+08:00 Liu, Yi A : > > Hi Zesheng, >> >> >> >> I got from an offline email of you and knew your Hadoop

Re: Regular expressions in fs paths?

2014-09-10 Thread Georgi Ivanov
Yes you can : hadoop fs -ls /tmp/myfiles* I would recommend first using -ls in order to verify you are selecting the right files. #Mahesh : do you need some help doing this ? On 10.09.2014 13:46, Mahesh Khandewal wrote: I want to unsubscribe from this mailing list On Wed, Sep 10, 2014 at

Re: Regular expressions in fs paths?

2014-09-10 Thread Mahesh Khandewal
I want to unsubscribe from this mailing list On Wed, Sep 10, 2014 at 4:42 PM, Charles Robertson < charles.robert...@gmail.com> wrote: > Hi all, > > Is it possible to use regular expressions in fs commands? Specifically, I > want to use the copy (-cp) and move (-mv) commands on all files in a > di

Regular expressions in fs paths?

2014-09-10 Thread Charles Robertson
Hi all, Is it possible to use regular expressions in fs commands? Specifically, I want to use the copy (-cp) and move (-mv) commands on all files in a directory that match a pattern (the pattern being all files that do not end in '.tmp'). Can this be done? Thanks, Charles

MapReduce data decompression using a custom codec

2014-09-10 Thread POUPON Kevin
Hello, I developed a custom compression codec for Hadoop. Of course Hadoop is set to use my codec when compressing data. For testing purposes, I use the following two commands: Compression test command: --- hadoop jar /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0

RE: Error and problem when running a hadoop job

2014-09-10 Thread YIMEN YIMGA Gael
Thank you for your all support. I could fix the issue this morning using this link, it was clearly explain. http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#java-io-ioexception-incompatible-namespaceids You can use the link as well. Warm regard From: viv

Re: S3 with Hadoop 2.5.0 - Not working

2014-09-10 Thread Harsh J
> Incorrect configuration: namenode address dfs.namenode.servicerpc-address or > dfs.namenode.rpc-address is not configured. > Starting namenodes on [] NameNode/DataNode are part of a HDFS service. It makes no sense to try and run them over an S3 URL default, which is a distributed filesystem in

Start standby namenode using bootstrapStandby hangs

2014-09-10 Thread sam liu
Hi Experts, My hadoop cluster is enabled HA with QJM and I failed to upgrade it from version 2.2.0 to 2.4.1. Why? Is this a existing issue? My steps: 1. Stop hadoop cluster 2. On each node, upgrade hadoop binary with the newer version 3. On each JournalNode: sbin/hadoop-daemon.sh start journalnod

Re: HDFS: Couldn't obtain the locations of the last block

2014-09-10 Thread Zesheng Wu
Thanks Yi, I will look into HDFS-4516. 2014-09-10 15:03 GMT+08:00 Liu, Yi A : > Hi Zesheng, > > > > I got from an offline email of you and knew your Hadoop version was > 2.0.0-alpha and you also said “The block is allocated successfully in NN, > but isn’t created in DN”. > > Yes, we may have th

S3 with Hadoop 2.5.0 - Not working

2014-09-10 Thread Dhiraj
Hi, I have downloaded hadoop-2.5.0 and am trying to get it working for s3 backend *(single-node in a pseudo-distributed mode)*. I have made changes to the core-site.xml according to https://wiki.apache.org/hadoop/AmazonS3 I have an backend object store running on my machine that supports S3. I g

RE: HDFS: Couldn't obtain the locations of the last block

2014-09-10 Thread Liu, Yi A
Hi Zesheng, I got from an offline email of you and knew your Hadoop version was 2.0.0-alpha and you also said “The block is allocated successfully in NN, but isn’t created in DN”. Yes, we may have this issue in 2.0.0-alpha. I suspect your issue is similar with HDFS-4516. And can you try Hadoo