Experience running MRv1 jobs against YARN clusters

2013-01-11 Thread Konstantin Boudnik
Hello. I am trying to find our about the experience MapReduce users have with real-life experience running MRv1 (old-API) applications against YARN cluster. What's good/bad? How big is the hassle of making these applications to run on YARN? Any cases where one had to make modifications in the a

Re: How does hadoop decide how many reducers to run?

2013-01-11 Thread Roy Smith
On Jan 11, 2013, at 6:20 PM, Michael Segel wrote: > Hi, > > First, not enough information. > > 1) EC2 got it. > 2) Which flavor of Hadoop? Is this EMR as well? Yes, EMR. We're running AMI version 2.3.1, which includes hadoop 1.0.3. > 3) How many slots did you configure in your mapred-sit

Re: How does hadoop decide how many reducers to run?

2013-01-11 Thread Michael Segel
Hi, First, not enough information. 1) EC2 got it. 2) Which flavor of Hadoop? Is this EMR as well? 3) How many slots did you configure in your mapred-site.xml? AWS EC2 cores aren't going to be hyperthreaded cores so 8 cores means that you will probably have 6 cores for slots. With 16 reduc

How does hadoop decide how many reducers to run?

2013-01-11 Thread Roy Smith
I ran a big job the other day on a cluster of 4 m2.4xlarge EC2 instances. Each instance is 8 cores, so 32 cores total. Hadoop ran 16 reducers, followed by a second wave of 12. It seems to me it was only using half the available cores. Is this normal? Is there some way to force it to use all

Re: Exit code 126?

2013-01-11 Thread Jean-Marc Spaggiari
Do you think it can be https://issues.apache.org/jira/browse/MAPREDUCE-4003 ? Because it's supposed to be fixed in 1.0.3 and that's the version I'm using... 2013/1/11, Dave Shine : > We see this quite a bit. There is a JIRA (I don't remember the number) that > addresses this issue. It has been a

RE: Exit code 126?

2013-01-11 Thread Dave Shine
We see this quite a bit. There is a JIRA (I don't remember the number) that addresses this issue. It has been applied to the apache distro, but I don't think it is incorporated in and release of CDH yet. Dave Shine Sr. Software Engineer 321.939.5093 direct | 407.314.0122 mobile CI Boost™ Clie

Exit code 126?

2013-01-11 Thread Jean-Marc Spaggiari
Hi, I ran a very simple rowcount (HBase) MR today, and one of the tasks failed with the status 126. Seems that there is no logs at all on the server side. Any idea what this mean? Thanks, JM java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)

Re: I am running MapReduce on a 30G data on 1master/2 slave, but failed.

2013-01-11 Thread Serge Blazhiyevskyy
Are you running this on the VM by any chance? On Jan 10, 2013, at 9:11 PM, Mahesh Balija mailto:balijamahesh@gmail.com>> wrote: Hi, 2 reducers are successfully completed and 1498 have been killed. I assume that you have the data issues. (Either the data is huge or some issue

Re: Sub-queues in capacity scheduler

2013-01-11 Thread Patai Sangbutsarakum
Thanks Harsh, I guess there would be no back port on this as because of differences of MR2 and 1. Regards, P On Thu, Jan 10, 2013 at 10:17 PM, Harsh J wrote: > Hierarchal queues are a feature of YARN's CapacityScheduler, which > isn't available in 1.x based releases/distributions such as CDH3u4.

Re: log server for hadoop MR jobs??

2013-01-11 Thread shashwat shriparv
Have a look on flume.. On Fri, Jan 11, 2013 at 11:58 PM, Xiaowei Li wrote: > ct all log generated from ∞ Shashwat Shriparv

log server for hadoop MR jobs??

2013-01-11 Thread Xiaowei Li
Hi, is there a facility, or a log server, to collect all log generated from each mapper/reducer of a MR job, in a centralized place, or unified log file on HDFS? now i look at logs in a naive way: go to jobtracker to check each task thanks -xw

Re: services requiring topology conf

2013-01-11 Thread Bryan Beaudreault
Thanks Adam! On Fri, Jan 11, 2013 at 12:15 PM, Adam Faris wrote: > A patch was submitted for topology documentation, but it doesn't appear to > have made it to any releases. This svn link may help starting at line 1294. > > http://svn.apache.org/viewvc?view=revision&revision=1411359 > > Assumi

Re: services requiring topology conf

2013-01-11 Thread Adam Faris
A patch was submitted for topology documentation, but it doesn't appear to have made it to any releases. This svn link may help starting at line 1294. http://svn.apache.org/viewvc?view=revision&revision=1411359 Assuming you are using hadoop 1.x and not yarn, the topology script only needs t

services requiring topology conf

2013-01-11 Thread Bryan Beaudreault
The documentation on topology conf (topology.script.file.name) is a little sparse, and while we have it working in our cluster I am trying to make it a little easier to configure. Currently we upload a python file and conf file to every node in our cluster. However I have a feeling that it is onl

Re: queues in haddop

2013-01-11 Thread Michael Segel
He's got two different queues. 1) queue in capacity scheduler so he can have a set or M/R tasks running in the background to pull data off of... 2) a durable queue that receives the inbound json files to be processed. You can have a customer written listener that pulls data from the queue and

Re: unsubscribe

2013-01-11 Thread Indrani Gorti
Thanks! On Fri, Jan 11, 2013 at 9:52 AM, Mohammad Tariq wrote: > Hello ma'am, > > In order to unsubscribe, you need to send the mail at " > user-unsubscr...@hadoop.apache.org". > > Warm Regards, > Tariq > https://mtariq.jux.com/ > > > On Fri, Jan 11, 2013 at 8:18 PM, Indrani Gorti wrote

Re: unsubscribe

2013-01-11 Thread Mohammad Tariq
Hello ma'am, In order to unsubscribe, you need to send the mail at " user-unsubscr...@hadoop.apache.org". Warm Regards, Tariq https://mtariq.jux.com/ On Fri, Jan 11, 2013 at 8:18 PM, Indrani Gorti wrote: > > > -- > Indrani Gorti >

unsubscribe

2013-01-11 Thread Indrani Gorti
-- Indrani Gorti

Re: queues in haddop

2013-01-11 Thread Tsuyoshi OZAWA
You can also use fluentd. http://fluentd.org/ "Fluentd receives logs as JSON streams, buffers them, and sends them to other systems like Amazon S3, MongoDB, Hadoop, or other Fluentds." It has a plugin for pushing into HDFS through fluent-plugin-webhdfs. https://github.com/fluent/fluent-plugin-webhd

Re: queues in haddop

2013-01-11 Thread Bertrand Dechoux
There is also kafka. http://kafka.apache.org "A high-throughput, distributed, publish-subscribe messaging system." But it does not push into HDFS, you need to launch a job to pull data in. Regards Bertrand On Fri, Jan 11, 2013 at 1:52 PM, Mirko Kämpf wrote: > I would suggest to work with Flum

Re: queues in haddop

2013-01-11 Thread Mirko Kämpf
I would suggest to work with Flume, in order to clollect a certain number of files and store it to HDFS in larger chunk or write it directly to HBase, this allows random access later on (if need) otherwise HBase could be an overkill. You can collect data in an MySQL DB and than import regularly via

Re: Getting started recommendations

2013-01-11 Thread Nitin Pawar
http://my.safaribooksonline.com/book/databases/hadoop/9780596521974 I loved this book. very well defined On Fri, Jan 11, 2013 at 3:22 AM, Michael Forage < michael.for...@livenation.co.uk> wrote: > I am still new but had similar questions and went through a lot of pain > getting started > >

RE: Getting started recommendations

2013-01-11 Thread Michael Forage
I am still new but had similar questions and went through a lot of pain getting started If you want to get programming rather than spend time learning how to install, configure and administer the Hadoop tools I recommend using Amazon Elastic MapReduce. This will very quickly get you to a stage

Re: Getting started recommendations

2013-01-11 Thread Ravi Mutyala
On Fri, Jan 11, 2013 at 4:29 AM, John Lilley wrote: > Where would we find some “big data” files that people have used for > testing purposes? Some of the most commonly used 'Big Data' files for testing are Global Weather Data from NCDC (ftp://ftp.ncdc.noaa.gov/pub/data/gsod), Enron emails, Airli

SequenceFile official mime type name

2013-01-11 Thread Andrzej Bialecki
Hi all, I'm working on an application that processes files in various formats, and among others it tries to identify their official mime type name. Sequence files are easily identifiable by their header (SEQn), but I'm not sure what mime type name I should use - I can use whatever I like, bu

Re: Getting started recommendations

2013-01-11 Thread Olivier Renault
Hi, Warning, I am a newby myself. Please find my answer inline. Good luck Olivier On 11 January 2013 10:29, John Lilley wrote: > We are somewhat new to Hadoop and are looking to run some experiments > with HDFS, Pig, and HBase. > > With that in mind, I have a few questions: > > What

Re: Getting started recommendations

2013-01-11 Thread Jason Lee
I aslo looking for a good and easy learn doc. On Fri, Jan 11, 2013 at 6:29 PM, John Lilley wrote: > We are somewhat new to Hadoop and are looking to run some experiments > with HDFS, Pig, and HBase. > > With that in mind, I have a few questions: > > What is the easiest (preferably fre

Re: queues in haddop

2013-01-11 Thread Hemanth Yamijala
Queues in the capacity scheduler are logical data structures into which MapReduce jobs are placed to be picked up by the JobTracker / Scheduler framework, according to some capacity constraints that can be defined for a queue. So, given your use case, I don't think Capacity Scheduler is going to d

Getting started recommendations

2013-01-11 Thread John Lilley
We are somewhat new to Hadoop and are looking to run some experiments with HDFS, Pig, and HBase. With that in mind, I have a few questions: What is the easiest (preferably free) Hadoop distro to get started with? Cloudera? What host OS distro/release is recommended? What is the easiest environme

Re: JobCache directory cleanup

2013-01-11 Thread Hemanth Yamijala
Hmm. Unfortunately, there is another config variable that may be affecting this: keep.task.files.pattern This is set to .* in the job.xml file you sent. I suspect this may be causing a problem. Can you please remove this, assuming you have not set it intentionally ? Thanks Hemanth On Fri, Jan