Re: Classic(MapReduce 1) cluster in Hadoop 0.23 just won't listen

2012-10-02 Thread Harsh J
Hi, The classic option exists to provide backward compatibility for users wanting to run an MR1 cluster (with JT, etc.). With the inclusion of YARN and MR2 modes of runtime, Apache Hadoop removed MR1 services support: """ ➜ mapred jobtracker Sorry, the jobtracker command is no longer supported.

Re: Hadoop Jobtracker web administration tool, hadoop job -list and Tasktrackers Web UI's show no information

2012-10-02 Thread Romedius Weiss
Sorry for double posting Hi! The services are running on master(NN,JT,TT,DN) and slave(TT,DN) according to jps. In the WebUI`s the slaves are shown as up and running, I'm getting heartbeats and everything. When running a job, it completes and logs everything to the command prompt. The only

Re: Classic(MapReduce 1) cluster in Hadoop 0.23 just won't listen

2012-10-02 Thread Alexander Hristov
Thanks for replying. I'm using the 0.23.3 release as distributed, no previous versions. So what's the point in documenting a classic option, then, if it is not available? I thought distributions were self-contained, or at least the docs don't mention that you need any previous versions. Wh

Re: Classic(MapReduce 1) cluster in Hadoop 0.23 just won't listen

2012-10-02 Thread Harsh J
What is your 'classic' MapReduce bundle version? 0.23 ships no classic MapReduce services bundle in it AFAIK, only YARN+(MR2-App). Whatever version you're trying to use, make sure it is not using the older HDFS jars? On Wed, Oct 3, 2012 at 10:13 AM, Alexander Hristov wrote: > Hi again > > Why do

Classic(MapReduce 1) cluster in Hadoop 0.23 just won't listen

2012-10-02 Thread Alexander Hristov
Hi again Why does it seem to me that everything Hadoop 0.23-related is an uphill battle? :-( I'm trying something as simple as running a classic(MapReduce 1) Hadoop cluster. Here's my configuration: core-site.xml: fs.default.name hdfs://samplehost.com:9000 hdfs-site

Re: How to lower the total number of map tasks

2012-10-02 Thread Romedius Weiss
Hi! According to the article @YDN* "The on-node parallelism is controlled by the mapred.tasktracker.map.tasks.maximum parameter." [http://developer.yahoo.com/hadoop/tutorial/module4.html] Also i think its better to set the min size instead of teh max size, so the algorithm tries to slice

RE: HADOOP in Production

2012-10-02 Thread Hank Cohen
Good points. I'm not trying to be exhaustive in the discussion of real time systems. My only intent was to point out the difference between real time and fast response. There are lots of real time requirements that do not require a particularly fast response but the response needs to be on time.

Re: A small portion of map tasks slows down the job

2012-10-02 Thread JAX
This is reasonable if you have any kind of trends in the ordering of your data or any computation in the mappers. You can use a smaller input split to Reduce the load on each individual mapper so that large blocks of records that take a long time To Process are less likely to clog one mapper.

A small portion of map tasks slows down the job

2012-10-02 Thread Huanchen Zhang
Hello, I have a small portion of map tasks whose output is much larger than others (more spills). So the reducer is mainly waiting for these a few map tasks. Is there a good solution for this problem ? Thank you. Best, Huanchen

RE: HADOOP in Production

2012-10-02 Thread Gauthier, Alexander
Owned. From: Ted Dunning [mailto:tdunn...@maprtech.com] Sent: Tuesday, October 02, 2012 4:13 PM To: user@hadoop.apache.org Subject: Re: HADOOP in Production On Tue, Oct 2, 2012 at 7:05 PM, Hank Cohen mailto:hank.co...@altior.com>> wrote: There is an important difference between real time and re

Re: HDFS "file" missing a part-file

2012-10-02 Thread Robert Molina
What I guess might be happening is that your data may contain some text data that pig is not fully parsing because the data contains characters that pig uses as delimiters (i.e commas and curly brackets). Thus, you can probably take a look at the data and see if you can find any of the characters

Re: HADOOP in Production

2012-10-02 Thread Ted Dunning
On Tue, Oct 2, 2012 at 7:05 PM, Hank Cohen wrote: > There is an important difference between real time and real fast > > Real time means that system response must meet a fixed schedule. > Real fast just means sooner is better. > Good thought, but real-time can also include a fixed schedule and a

RE: puzzled at the output

2012-10-02 Thread Kartashov, Andy
Bertrand/Mohamed, You guys are awesome. Thanks a million… Commenting out the Combiner class in the driver solved the issue. p.s. I have one more small dilemma. I am trying to create xml from two files. The input for my 3rd MR job is the (Text,Text) output from two MapReds. I feed my inputto

Re: How to lower the total number of map tasks

2012-10-02 Thread Shing Hing Man
I only have one big input file. Shing From: Bejoy KS To: user@hadoop.apache.org; Shing Hing Man Sent: Tuesday, October 2, 2012 6:46 PM Subject: Re: How to lower the total number of map tasks Hi Shing Is your input a single file or set of small files? If

Re: How to lower the total number of map tasks

2012-10-02 Thread Shing Hing Man
I have done the following. 1)  stop-all.sh 2)  In mapred-site.xml,  added   mapred.max.split.size   134217728   (df.block.size remain unchanged at  67108864) 3) start-all.sh 4) Use hadoop fs -cp src destn,  to copy  my original file to  another hdfs directory. 5) Run my mapReduce progra

RE: HADOOP in Production

2012-10-02 Thread Hank Cohen
There is an important difference between real time and real fast Real time means that system response must meet a fixed schedule. Real fast just means sooner is better. Real time systems always have hard schedules. The schedule could be in microseconds to control a laser for making masks for s

Re: How to lower the total number of map tasks

2012-10-02 Thread Bejoy KS
Hi Shing Is your input a single file or set of small files? If latter you need to use CombineFileInputFormat. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Shing Hing Man Date: Tue, 2 Oct 2012 10:38:59 To: user@hadoop.apache.org Reply-To: user@ha

Re: File block size use

2012-10-02 Thread Anna Lahoud
Chris - You are absolutely correct in what I am trying to accomplish - decrease the number of files going to the maps. Admittedly, I haven't run through all the suggestions yet today. I hope to do that by days' end. Thank you and I will give an update later on what worked. On Tue, Oct 2, 2012 at 1

Re: How to lower the total number of map tasks

2012-10-02 Thread Shing Hing Man
I have tried    Configuration.setInt("mapred.max.split.size",134217728); and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864). But in the job.xml, I am still getting mapred.map.tasks =242 . Shing Fro

Re: How to lower the total number of map tasks

2012-10-02 Thread Bejoy KS
Shing This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs. hadoop fs -cp src destn. Regards Bejoy KS Sent from handheld, please excuse typos. -Origi

Re: puzzled at the output

2012-10-02 Thread Mohamed Trad
I agree with Bertrand. Try disabling the combiner. Envoyé de mon iPhone Le 2 oct. 2012 à 19:02, Bertrand Dechoux a écrit : > Combiner? And you are only using 'Text' as type? > > Please do a real test with a specified input. We can only guess. > > Bertrand > > On Tue, Oct 2, 2012 at 6:52 PM,

Re: How to lower the total number of map tasks

2012-10-02 Thread Shing Hing Man
 I set the block size using   Configuration.setInt("dfs.block.size",134217728); I have also set it  in mapred-site.xml. Shing From: Chris Nauroth To: user@hadoop.apache.org; Shing Hing Man Sent: Tuesday, October 2, 2012 6:00 PM Subject: Re: How to lowe

Re: How to lower the total number of map tasks

2012-10-02 Thread Bejoy Ks
Sorry for the typo, the property name is mapred.max.split.size Also just for changing the number of map tasks you don't need to modify the hdfs block size. On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks wrote: > Hi > > You need to alter the value of mapred.max.split size to a value larger > than you

Re: puzzled at the output

2012-10-02 Thread Bertrand Dechoux
Combiner? And you are only using 'Text' as type? Please do a real test with a specified input. We can only guess. Bertrand On Tue, Oct 2, 2012 at 6:52 PM, Chris Nauroth wrote: > Is there also a Mapper? Is there any chance that logic in the Mapper > wrapped the values with the tags too, so that

Re: How to lower the total number of map tasks

2012-10-02 Thread Bejoy Ks
Hi You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default. On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man wrote: > > > > I am running Hadoop 1.0.3 in Pseudo distributed mode. > When I submit a map/reduce j

Re: How to lower the total number of map tasks

2012-10-02 Thread Chris Nauroth
Those numbers make sense, considering 1 map task per block. 16 GB file / 64 MB block size = ~242 map tasks. When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified

Re: puzzled at the output

2012-10-02 Thread Chris Nauroth
Is there also a Mapper? Is there any chance that logic in the Mapper wrapped the values with the tags too, so that the records were already wrapped when they entered the reducer logic? Thank you, --Chris On Tue, Oct 2, 2012 at 9:01 AM, Kartashov, Andy wrote: > I want: > > Key > Valu

How to lower the total number of map tasks

2012-10-02 Thread Shing Hing Man
I am running Hadoop 1.0.3 in Pseudo  distributed mode. When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following mapred.map.tasks =242 mapred.min.split.size =0 dfs.block.size = 67108864 I would like to reduce   mapred.map.tasks to see if it i

Re: Hadoop Archives under 0.23

2012-10-02 Thread Alexander Hristov
According to hdfs, -lsr is deprecated and -ls -R is to be used instead. In any case, it doesn't matter as the result is exactly the same. El 02/10/2012 2:12, Alexander Hristov escribió: Hello I'm trying to test the Hadoop archive functionality under 0.23 and I can't get it working. I have

RE: puzzled at the output

2012-10-02 Thread Kartashov, Andy
I want: Key Value1 Value2 I get double tags: Key Value1 Value2 Here is my last proposition that also failed in Reduce. ... public void reduce (. StringBuilder sb = new StringBuilder(); while (values.hasNext()){ sb.appen

Re: Which hardware to choose

2012-10-02 Thread Oleg Ruchovets
Great , Thank you for the such detailed information, By the way what type of Disk Controller do you use? Thanks Oleg. On Tue, Oct 2, 2012 at 6:34 AM, Alexander Pivovarov wrote: > Privet Oleg > > Cloudera and Dell setup the following cluster for my company > Company receives 1.5 TB raw data pe

Re: File block size use

2012-10-02 Thread Raj Vishwanathan
I haven't tried it but this should also work  hadoop  fs  -Ddfs.block.size= -cp  src dest Raj > > From: Anna Lahoud >To: user@hadoop.apache.org; bejoy.had...@gmail.com >Sent: Tuesday, October 2, 2012 7:17 AM >Subject: Re: File block size use > > >Thank you.

Re: File block size use

2012-10-02 Thread Anna Lahoud
Thank you. I will try today. On Tue, Oct 2, 2012 at 12:23 AM, Bejoy KS wrote: > ** > Hi Anna > > If you want to increase the block size of existing files. You can use a > Identity Mapper with no reducer. Set the min and max split sizes to your > requirement (512Mb). Use SequenceFileInputFormat a

Re: Upgrade not finalized

2012-10-02 Thread Harsh J
Ulrich, It is fine to run the "hadoop dfsadmin -finalizeUpgrade" command to complete the upgrade but since it has been a while that you haven't done that, I expect it may take quite a while to finish as there'll now be lots of blocks to process for upgrades. However, there should be no risk in run

Re: puzzled at the output

2012-10-02 Thread Harsh J
Hi, Could you clarify your post to show what you expect your code to have actually printed and what it has printed? On Tue, Oct 2, 2012 at 7:01 PM, Kartashov, Andy wrote: > Guys, have been stretching my head for the past couple of days. Why are my > tags duplicated while the content they wrap a

Upgrade not finalized

2012-10-02 Thread Ulrich Kammer
Hello, in april we have upgraded our version of Hadoop 0.21.0 to version 1.0.1. We stuck to the exact instructions in the upgrade documentation as always. After a few months we have discovered on the Namenode that here again "upgrade for version -32 has been completed. Upgrade is not finalize

puzzled at the output

2012-10-02 Thread Kartashov, Andy
Guys, have been stretching my head for the past couple of days. Why are my tags duplicated while the content they wrap around i.e.my StringBuilder sb is not? My Reduce code is: while (values.hasNext()){ sb.append(values.next().toString()); } output.collect(key,new Text("\n\n"+sb.to

Re: Hadoop Archives under 0.23

2012-10-02 Thread Marcos Ortiz
El 02/10/2012 2:12, Alexander Hristov escribió: Hello I'm trying to test the Hadoop archive functionality under 0.23 and I can't get it working. I have in HDFS a /test folder with several text files. I created a hadoop archive using hadoop archive -archiveName test.har -p /test *.txt /s

Re: HDFS "file" missing a part-file

2012-10-02 Thread Björn-Elmar Macek
Hi again, i executed a slightly different script again, that included some more operations. The logs look similar, but this time i have 2 attempt files for the same job-package: (1) _temporary/_attempt_201210021204_0001_r_01_0/part-r-1 (2) _temporary/_attempt_201210021204_0001_r_01

Re: HADOOP in Production

2012-10-02 Thread Michael Segel
Funny that the OP asks about 'real time'... This comes up quiet often and its always misunderstood. First, when we say 'real time' many take it to mean subjective real time. Real 'real time' would require some sort of RTOS underneath. Second Hadoop is a parallelized framework. You have sever

Re: HADOOP in Production

2012-10-02 Thread Ruslan Al-Fakikh
Hi, There are too many issues to discuss I guess. I would recommend reading Hadoop The Definitive Guide by Tom White. There are some chapters for the answers. Also what did you mean my 'real time"? Hadoop is not designed for giving real time results of queries. It is rather for offline data analys

Edit wiki: New Moscow Hadoop Meetup and looking for sponsors and cooperation

2012-10-02 Thread Ruslan Al-Fakikh
Hello! Please add a new meetup: http://www.meetup.com/Hadoop-Moscow/ to page: http://wiki.apache.org/hadoop/RussiaHadoopUserGroup Thanks in advance! For Hadoopers - please visit the upcoming meetup. For Apache/Hortonworks/Cloudera/MapR, etc - please contact the meetup organizer (me) for any type

Re: Hadoop Jobtracker web administration tool, hadoop job -list and Tasktrackers Web UI's show no information

2012-10-02 Thread gschen
于 2012/10/2 15:15, Tatsuo Kawasaki 写道: Hi, Could you please hit jps command on your Jobtracker node? And please check your firewall settings. (If you are using RHEL/CentOS, run iptables -L) Cheers, -- Tatsuo -- Replied Message -- From: Romedius Weiss To: user@h

Re: Hadoop Jobtracker web administration tool, hadoop job -list and Tasktrackers Web UI's show no information

2012-10-02 Thread Tatsuo Kawasaki
Hi, Could you please hit jps command on your Jobtracker node? And please check your firewall settings. (If you are using RHEL/CentOS, run iptables -L) Cheers, -- Tatsuo -- Replied Message -- From: Romedius Weiss To: user@hadoop.apache.org Date: Tue, 02 Oct 2012 0