Re: Biggest cluster running YARN in the world?

2013-01-15 Thread Hemanth Yamijala
You may get more updated information from folks at Yahoo!, but here is a mail on hadoop-general mailing list that has some statistics: http://www.mail-archive.com/general@hadoop.apache.org/msg05592.html Please note it is a little dated, so things should be better now :-) Thank hemanth On Tue,

MPI and hadoop on same cluster

2013-01-15 Thread rahul v
Hi, This issue issues.apache.org/jira/browse/MAPREDUCE-2911 talks about executing hadoop and MPI on the same cluster.Even though the comments suggest Ralph has finished writing the code, am not able to find the patch. Can someone guide me towards finding the same? -- Regards, R.V.

When reduce tasks start in MapReduce Streaming?

2013-01-15 Thread Pedro Sá da Costa
Hi, I read from documents that in MapReduce, the reduce tasks only start after a percentage (by default 90%) of maps end. This means that the slowest maps can delay the start of reduce tasks, and the input data that is consumed by the reduce tasks is represented as a batch of data. This means

Re: When reduce tasks start in MapReduce Streaming?

2013-01-15 Thread Jeff Bean
Hi Pedro, Yes, Hadoop Streaming has the same property. The reduce method is not called until the mappers are done, and the reducers are not scheduled before the threshold set by mapred.reduce.slowstart.completed.maps is reached. On Tue, Jan 15, 2013 at 3:06 PM, Pedro Sá da Costa

Re: I am running MapReduce on a 30G data on 1master/2 slave, but failed.

2013-01-15 Thread yaotian
I set mapred.reduce.tasks from -1 to AutoReduce And the hadoop created 450 tasks for Map. But 1 task for Reduce. It seems that this reduce only run on 1 slave (I have two slaves). But when it was running on 66%, the error report again Task attempt_201301150318_0001_r_00_0 failed to report

I would like to report a bug

2013-01-15 Thread Carolina Vizuete Martinez
Hi, I would like to report a bug . I get negative value of function unix_timestamp. I want to get a duration of format HH:mm:ss unix_timestamp(duration,'HH:mm:ss') When I test it backwards ,it works , I get my duration in a well format with correct values. This: from_unixtime(

Re: question about ZKFC daemon

2013-01-15 Thread ESGLinux
Hi all, I´m only testing the new HA feature. I´m not in a production system, Well, let´s talk about the number of nodes and the ZKFC daemons. In this url:

Re: question about ZKFC daemon

2013-01-15 Thread Harsh J
Hi, I fail to see your confusion. ZKFC != ZK ZK is a quorum software, like QJM is. The ZK peers are to be run odd in numbers, such as JNs are to be. ZKFC is something the NN needs for its Automatic Failover capability. It is a client to ZK and thereby demands ZK's presence; for which the odd #

Re: question about ZKFC daemon

2013-01-15 Thread ESGLinux
Hi Harsh, Now I´m confussed at all :- as you pointed ZKFC runs only in the NN. That´s looks right. So, what are ZK peers (the odd number I´m looking for) and where I have to run them? on another 3 nodes? As I can read from the previous url: In a typical deployment, ZooKeeper daemons are

Re:Re: I am running MapReduce on a 30G data on 1master/2 slave, but failed.

2013-01-15 Thread Charlie A.
Hi, yaotian I think you should check logs on the very tasktracker, it'll tell you why. And here's some tips on deploying a MR job. http://blog.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/ Charlie At 2013-01-15 16:34:27,yaotian yaot...@gmail.com wrote: I set

Re: question about ZKFC daemon

2013-01-15 Thread Harsh J
No, ZooKeeper daemons == http://zookeeper.apache.org. On Tue, Jan 15, 2013 at 3:38 PM, ESGLinux esggru...@gmail.com wrote: Hi Harsh, Now I´m confussed at all :- as you pointed ZKFC runs only in the NN. That´s looks right. So, what are ZK peers (the odd number I´m looking for) and

Re: question about ZKFC daemon

2013-01-15 Thread ESGLinux
ok, Thats the origin of my confussion, I thought they were the same. I´m going to read this doc to bring me a bit of light about ZooKeeper.. thank you very much for your help, ESGLinux, 2013/1/15 Harsh J ha...@cloudera.com No, ZooKeeper daemons == http://zookeeper.apache.org. On Tue,

Re: OutofMemoryError when running an YARN application with 25 containers

2013-01-15 Thread Arun C Murthy
How many maps reduces did your job have? Also, what release are you using? I'd recommend at least 2.0.2-alpha, though we should be able to release 2.0.3-alpha very soon. Arun On Jan 14, 2013, at 4:35 AM, Krishna Kishore Bonagiri wrote: Hi, I am getting the following error in

RE: I would like to report a bug

2013-01-15 Thread Viral Bajaria
I don't think this is the right list for your query. Moving hadoop list to bcc and cc'ing hive list. Also I don't get how you can get unix timestamp from a field with just hour granularity, are you missing the date information in your date format ? Viral From: Carolina Vizuete Martinez Sent:

Re: Scheduling non-MR processes

2013-01-15 Thread Arun C Murthy
YARN implies 2 pieces: # Application specific 'master' to co-ordinate your application (mainly to get resources i.e. containers for it's application from the ResourceManager and use them) # Application specific code which runs in the allocated Containers. Given your use case, I'd recommend that

Hadoop execution sequence

2013-01-15 Thread Panshul Whisper
Hello, I was wondering if hadoop performs the map reduce operations on the data in maintaining he order or sequence of data in which it received the data. I have a hadoop cluster that is receiving json files.. Which are processed and then stored on base. For correct calculation it is essential

Re: Hadoop execution sequence

2013-01-15 Thread Mahesh Balija
As per the Mapreduce behavior, mapper will process all the input file(s) in parallel. i.e., no order is guaranteed among the input files. If you want to process each file separately and maintain the order then you need to process each file separately (in an independent mapreduce job) so that your

RE: Hadoop execution sequence

2013-01-15 Thread John Lilley
I think it will help for Ouch to clarify what is meant by in order. If one JSON file must be completely processed before the next file starts, there is not much point to using MapReduce at all, since your problem cannot be partitioned. On the other hand, there may be ways around this, for

Re: Hadoop Pseudo-configuration

2013-01-15 Thread Mohammad Tariq
Helllo Shagun, Make sure you have set the required config parameters correctly. Also, modify the line containing 127.0.1.1 to 127.0.0.1 in your etc/hosts file and add the hostname of your VM along with the IP in the etc/hosts file if you are using FQDN. If you still face any issue, have

Re: Compile error using contrib.utils.join package with new mapreduce API

2013-01-15 Thread Hemanth Yamijala
On the dev mailing list, Harsh pointed out that there is another join related package: http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/join/ This seems to be available

Re: FileSystem.workingDir vs mapred.local.dir

2013-01-15 Thread Hemanth Yamijala
Hi, AFAIK, the mapred.local.dir property refers to a set of directories under which different types of data related to mapreduce jobs are stored - for e.g. intermediate data, localized files for a job etc. The working directory for a mapreduce job is configured under a sub directory within one of

Re: FileSystem.workingDir vs mapred.local.dir

2013-01-15 Thread Jay Vyas
what do you mean by workingdir for a filesystem ? I never thought that a fileSystem should need or have a special workingDir ? On Tue, Jan 15, 2013 at 12:43 PM, Hemanth Yamijala yhema...@thoughtworks.com wrote: Hi, AFAIK, the mapred.local.dir property refers to a set of directories under

Re: FileSystem.workingDir vs mapred.local.dir

2013-01-15 Thread Harsh J
Jay, For your FS question: http://en.wikipedia.org/wiki/Working_directory On Wed, Jan 16, 2013 at 12:17 AM, Jay Vyas jayunit...@gmail.com wrote: what do you mean by workingdir for a filesystem ? I never thought that a fileSystem should need or have a special workingDir ? On Tue, Jan 15,

Re: FileSystem.workingDir vs mapred.local.dir

2013-01-15 Thread Jay Vyas
ah okay. so - in default hadoop dfs, the workingDir is (i beleive) /user/hadoop/ , because as i recall when putting a file into hdfs, that seems to be where the files naturally end up if there is no path specified.

(hive) Table loading with NULL values

2013-01-15 Thread Monkey2Code
Hi , I have created a table a below --use case 2 to test basics DROP TABLE USER_DATA4; CREATE TABLE USER_dATA4 ( userid INT, movieid INT, rating INT, unixtime int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; --loading data into user_table LOAD DATA LOCAL INPATH

newbie question

2013-01-15 Thread jamal sasha
I have a mapper public class BuildGraph{ public void config(JobConf job){ *==this block doesnt seems to be exexcuting at all :(* super.configure(job); this.currentId = job.getInt(currentId,0); if (this.currentId!=0){ // I call a method from differnt

Re: hadoop namenode recovery

2013-01-15 Thread randy
What happens to the NN and/or performance if there's a problem with the NFS server? Or the network? Thanks, randy On 01/14/2013 11:36 PM, Harsh J wrote: Its very rare to observe an NN crash due to a software bug in production. Most of the times its a hardware fault you should worry about. On

Re: newbie question

2013-01-15 Thread feng lu
Hi Jamal i think you use the old MR API, you should implements the Mapper interface. Maybe you can take a look an WordCount.java example in example.org.apache.hadoop.examples. On Wed, Jan 16, 2013 at 10:03 AM, jamal sasha jamalsha...@gmail.com wrote: Hi, The relevant code snippet posted

Re: hadoop namenode recovery

2013-01-15 Thread Harsh J
The NFS mount is to be soft-mounted; so if the NFS goes down, the NN ejects it out and continues with the local disk. If auto-restore is configured, it will re-add the NFS if its detected good again later. On Wed, Jan 16, 2013 at 7:04 AM, randy randy...@comcast.net wrote: What happens to the

Re: config file loactions in Hadoop 2.0.2

2013-01-15 Thread Hemanth Yamijala
Hi, One place where I could find the capacity-scheduler.xml was from source - hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/resources. AFAIK, the masters file is only used for starting the secondary namenode - which has in 2.x been replaced by a

Re: config file loactions in Hadoop 2.0.2

2013-01-15 Thread Harsh J
The masters file was removed cause we now look into config files directly and pull out the right host to start the various configured master servers (NNs, SNN, etc.) at. On Wed, Jan 16, 2013 at 9:56 AM, Hemanth Yamijala yhema...@thoughtworks.com wrote: Hi, One place where I could find the

RE: hadoop namenode recovery

2013-01-15 Thread Rakesh R
Hi, I feel the most reliable approach is using NN-HA features with shared storage. Here the idea is having two Namenodes. Both the Active, Standby(Secondary) Namenodes will be pointing to the shared device and writes the editlogs to it. When the Active crashes, Standby will take over and

Re: (hive) Table loading with NULL values

2013-01-15 Thread Nitin Pawar
is :: your delimiter or : only? in any case you can try by specifing '\072' as column separator (thats for one : ) On Wed, Jan 16, 2013 at 2:49 AM, Monkey2Code monkey2c...@gmail.com wrote: Hi , I have created a table a below --use case 2 to test basics DROP TABLE USER_DATA4; CREATE

Re: Hadoop NON DFS space

2013-01-15 Thread Harsh J
kidding Wipe your OS out. /kidding Please read: http://search-hadoop.com/m/9Qwi9UgMOe On Wed, Jan 16, 2013 at 1:16 PM, Vikas Jadhav vikascjadha...@gmail.comwrote: how to remove non dfs space from hadoop cluster -- * * * Thanx and Regards* * Vikas Jadhav* -- Harsh J

Re: Hadoop NON DFS space

2013-01-15 Thread Jagat Singh
:) Check whats consuming space other then core OS files. Check temp spaces and other areas On Wed, Jan 16, 2013 at 6:53 PM, Harsh J ha...@cloudera.com wrote: kidding Wipe your OS out. /kidding Please read: http://search-hadoop.com/m/9Qwi9UgMOe On Wed, Jan 16, 2013 at 1:16 PM, Vikas