RE: HDFS: Couldn't obtain the locations of the last block

2014-09-10 Thread Liu, Yi A
Hi Zesheng, I got from an offline email of you and knew your Hadoop version was 2.0.0-alpha and you also said “The block is allocated successfully in NN, but isn’t created in DN”. Yes, we may have this issue in 2.0.0-alpha. I suspect your issue is similar with HDFS-4516. And can you try

S3 with Hadoop 2.5.0 - Not working

2014-09-10 Thread Dhiraj
Hi, I have downloaded hadoop-2.5.0 and am trying to get it working for s3 backend *(single-node in a pseudo-distributed mode)*. I have made changes to the core-site.xml according to https://wiki.apache.org/hadoop/AmazonS3 I have an backend object store running on my machine that supports S3. I

Re: HDFS: Couldn't obtain the locations of the last block

2014-09-10 Thread Zesheng Wu
Thanks Yi, I will look into HDFS-4516. 2014-09-10 15:03 GMT+08:00 Liu, Yi A yi.a@intel.com: Hi Zesheng, I got from an offline email of you and knew your Hadoop version was 2.0.0-alpha and you also said “The block is allocated successfully in NN, but isn’t created in DN”. Yes, we

Start standby namenode using bootstrapStandby hangs

2014-09-10 Thread sam liu
Hi Experts, My hadoop cluster is enabled HA with QJM and I failed to upgrade it from version 2.2.0 to 2.4.1. Why? Is this a existing issue? My steps: 1. Stop hadoop cluster 2. On each node, upgrade hadoop binary with the newer version 3. On each JournalNode: sbin/hadoop-daemon.sh start

Re: S3 with Hadoop 2.5.0 - Not working

2014-09-10 Thread Harsh J
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured. Starting namenodes on [] NameNode/DataNode are part of a HDFS service. It makes no sense to try and run them over an S3 URL default, which is a distributed filesystem in

RE: Error and problem when running a hadoop job

2014-09-10 Thread YIMEN YIMGA Gael
Thank you for your all support. I could fix the issue this morning using this link, it was clearly explain. http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#java-io-ioexception-incompatible-namespaceids You can use the link as well. Warm regard From:

MapReduce data decompression using a custom codec

2014-09-10 Thread POUPON Kevin
Hello, I developed a custom compression codec for Hadoop. Of course Hadoop is set to use my codec when compressing data. For testing purposes, I use the following two commands: Compression test command: --- hadoop jar

Re: Regular expressions in fs paths?

2014-09-10 Thread Mahesh Khandewal
I want to unsubscribe from this mailing list On Wed, Sep 10, 2014 at 4:42 PM, Charles Robertson charles.robert...@gmail.com wrote: Hi all, Is it possible to use regular expressions in fs commands? Specifically, I want to use the copy (-cp) and move (-mv) commands on all files in a

Re: Regular expressions in fs paths?

2014-09-10 Thread Georgi Ivanov
Yes you can : hadoop fs -ls /tmp/myfiles* I would recommend first using -ls in order to verify you are selecting the right files. #Mahesh : do you need some help doing this ? On 10.09.2014 13:46, Mahesh Khandewal wrote: I want to unsubscribe from this mailing list On Wed, Sep 10, 2014 at

Re: HDFS: Couldn't obtain the locations of the last block

2014-09-10 Thread Zesheng Wu
Hi Yi, I went through HDFS-4516, and it really solves our problem, thanks very much! 2014-09-10 16:39 GMT+08:00 Zesheng Wu wuzeshen...@gmail.com: Thanks Yi, I will look into HDFS-4516. 2014-09-10 15:03 GMT+08:00 Liu, Yi A yi.a@intel.com: Hi Zesheng, I got from an offline email of

Re: Regular expressions in fs paths?

2014-09-10 Thread Charles Robertson
Hi Georgi, Thanks for your reply. Won't hadoop fs -ls /tmp/myfiles* return all files that begin with 'myfiles' in the tmp directory? What I don't understand is how I can specify a pattern that excludes files ending in '.tmp'. I have tried using the normal regular expression syntax for this

RE: HDFS: Couldn't obtain the locations of the last block

2014-09-10 Thread Liu, Yi A
That’s great. Regards, Yi Liu From: Zesheng Wu [mailto:wuzeshen...@gmail.com] Sent: Wednesday, September 10, 2014 8:25 PM To: user@hadoop.apache.org Subject: Re: HDFS: Couldn't obtain the locations of the last block Hi Yi, I went through HDFS-4516, and it really solves our problem, thanks very

Error when executing a WordCount Program

2014-09-10 Thread YIMEN YIMGA Gael
Hello Hadoopers, Here is the error, I'm facing when running WordCount example program written by myself. Kind find attached the file of my WordCount program. Below the error.

Re: Error when executing a WordCount Program

2014-09-10 Thread Shahab Yunus
*hdfs://latdevweb02:9000/home/hadoop/hadoop/input* is this is a valid path on hdfs? Can you access this path outside of the program? For example using hadoop fs -ls command? Also, was this path and files in it, created by a different user? The exception seem to say that it does not exist or the

Re: Error when executing a WordCount Program

2014-09-10 Thread Chris MacKenzie
Hi have you set a class in your code ? WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). Also you need to check the path for your input file Input path does not exist: hdfs://latdevweb02:9000/home/hadoop/hadoop/input

running beyond virtual memory limits

2014-09-10 Thread Jakub Stransky
Hello, I am getting following error when running on 500MB dataset compressed in avro data format. Container [pid=22961,containerID=container_1409834588043_0080_01_10] is running beyond virtual memory limits. Current usage: 636.6 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual

RE: Error when executing a WordCount Program

2014-09-10 Thread YIMEN YIMGA Gael
Hi, Please that is my real problem. Could you please look into my code in attached and tell me how I can update this, please ? How to set a job jar file? And now, here is my hdfs-site.xml == -bash-4.1$ cat conf/hdfs-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl

RE: Error when executing a WordCount Program

2014-09-10 Thread YIMEN YIMGA Gael
Hi, In fact, hdfs://latdevweb02:9000/home/hadoop/hadoop/input is not a folder on hdfs. I created a folder /tmp/hadoop-hadoop/dfs/data, where data will be saved in hdfs. And in my HADOOP_HOME folder, there is two folders “input” and “output”, but I don’t know how to configure them in the

Hadoop Smoke Test: TERASORT

2014-09-10 Thread arthur.hk.c...@gmail.com
Hi, I am trying the smoke test for Hadoop (2.4.1). About “terasort”, below is my test command, the Map part was completed very fast because it was split into many subtasks, however the Reduce part takes very long time and only 1 running Reduce job. Is there a way speed up the reduce phase by

Re: Regular expressions in fs paths?

2014-09-10 Thread Rich Haase
HDFS doesn't support he full range of glob matching you will find in Linux. If you want to exclude all files from a directory listing that meet a certain criteria try doing your listing and using grep -v to exclude the matching records.

Writing output from streaming task without dealing with key/value

2014-09-10 Thread Dmitry Sivachenko
Hello! Imagine the following common task: I want to process big text file line-by-line using streaming interface. Run unix grep command for instance. Or some other line-by-line processing, e.g. line.upper(). I copy file to HDFS. Then I run a map task on this file which reads one line,

Re: Hadoop Smoke Test: TERASORT

2014-09-10 Thread Rich Haase
You can set the number of reducers used in any hadoop job from the command line by using -Dmapred.reduce.tasks=XX. e.g. hadoop jar hadoop-mapreduce-examples.jar terasort -Dmapred.reduce.tasks=10 /terasort-input /terasort-output

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Susheel Kumar Gadalay
If you don't want key in the final output, you can set like this in Java. job.setOutputKeyClass(NullWritable.class); It will just print the value in the output file. I don't how to do it in python. On 9/10/14, Dmitry Sivachenko trtrmi...@gmail.com wrote: Hello! Imagine the following common

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Rich Haase
In python, or any streaming program just set the output value to the empty string and you will get something like key\t. On Wed, Sep 10, 2014 at 12:03 PM, Susheel Kumar Gadalay skgada...@gmail.com wrote: If you don't want key in the final output, you can set like this in Java.

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Dmitry Sivachenko
On 10 сент. 2014 г., at 22:05, Rich Haase rdha...@gmail.com wrote: In python, or any streaming program just set the output value to the empty string and you will get something like key\t. I see, but I want to use many existing programs (like UNIX grep), and I don't want to have and extra

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Rich Haase
You can write a custom output format, or you can write your mapreduce job in Java and use a NullWritable as Susheel recommended. grep (and every other *nix text processing command) I can think of would not be limited by a trailing tab character. It's even quite easy to strip away that tab

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Dmitry Sivachenko
On 10 сент. 2014 г., at 22:19, Rich Haase rdha...@gmail.com wrote: You can write a custom output format Any clues how can this can be done? , or you can write your mapreduce job in Java and use a NullWritable as Susheel recommended. grep (and every other *nix text processing

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Shahab Yunus
Examples (the top ones are related to streaming jobs): http://www.infoq.com/articles/HadoopOutputFormat http://research.neustar.biz/2011/08/30/custom-inputoutput-formats-in-hadoop-streaming/

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Dmitry Sivachenko
10 сент. 2014 г., в 22:47, Shahab Yunus shahab.yu...@gmail.com написал(а): Examples (the top ones are related to streaming jobs): http://www.infoq.com/articles/HadoopOutputFormat http://research.neustar.biz/2011/08/30/custom-inputoutput-formats-in-hadoop-streaming/

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Felix Chern
Use ‘tr -s’ to stripe out tabs? $ echo -e a\t\t\tb a b $ echo -e a\t\t\tb | tr -s \t a b On Sep 10, 2014, at 11:28 AM, Dmitry Sivachenko trtrmi...@gmail.com wrote: On 10 сент. 2014 г., at 22:19, Rich Haase rdha...@gmail.com wrote: You can write a custom

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Felix Chern
If you don’t want anything get inserted, just set your output to key only or value only. TextOutputFormat$LineRecordWriter won’t insert anything unless both values are set: public synchronized void write(K key, V value) throws IOException { boolean nullKey = key == null || key

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Dmitry Sivachenko
On 11 сент. 2014 г., at 0:47, Felix Chern idry...@gmail.com wrote: If you don’t want anything get inserted, just set your output to key only or value only. TextOutputFormat$LineRecordWriter won’t insert anything unless both values are set: If I output value only, for instance, and my line

Re: Regular expressions in fs paths?

2014-09-10 Thread Charles Robertson
I solved this in the end by using a shell script (initiated by an oozie shell action) to use grep and loop through the results - didn't have to use -v option, as the -e option gives you access to a fuller range of regular expression functionality. Thanks for your help (again!) Rich. Charles On

The running job is blocked for a while if the queue is short of resources

2014-09-10 Thread Anfernee Xu
Hi experts, I faced one strange issue I cannot understand, can you guys tell me if this is a bug or I configured something wrong. Below is my situation. I'm running with Hadopp 2.2.0 release and all my jobs are uberized, each node only can run a single job at a point of time, I used

Balancing is very slow.

2014-09-10 Thread cho ju il
hadoop 2.4.1 Balancing is very slow. $HADOOP_PREFIX/bin/hdfs dfsadmin -setBalancerBandwidth 52428800 It takes long time to move the one block. 2014. 09. 11. 11:38:01 Block begins to move 2014-09-11 11:47:20 Complete block move #10.2.1.211 netstat, Block begins to move, 10.2.1.210