Hello.
I am trying to find our about the experience MapReduce users have with
real-life experience running MRv1 (old-API) applications against YARN cluster.
What's good/bad?
How big is the hassle of making these applications to run on YARN?
Any cases where one had to make modifications in the a
On Jan 11, 2013, at 6:20 PM, Michael Segel wrote:
> Hi,
>
> First, not enough information.
>
> 1) EC2 got it.
> 2) Which flavor of Hadoop? Is this EMR as well?
Yes, EMR. We're running AMI version 2.3.1, which includes hadoop 1.0.3.
> 3) How many slots did you configure in your mapred-sit
Hi,
First, not enough information.
1) EC2 got it.
2) Which flavor of Hadoop? Is this EMR as well?
3) How many slots did you configure in your mapred-site.xml?
AWS EC2 cores aren't going to be hyperthreaded cores so 8 cores means that you
will probably have 6 cores for slots.
With 16 reduc
I ran a big job the other day on a cluster of 4 m2.4xlarge EC2 instances. Each
instance is 8 cores, so 32 cores total. Hadoop ran 16 reducers, followed by a
second wave of 12. It seems to me it was only using half the available cores.
Is this normal? Is there some way to force it to use all
Do you think it can be
https://issues.apache.org/jira/browse/MAPREDUCE-4003 ? Because it's
supposed to be fixed in 1.0.3 and that's the version I'm using...
2013/1/11, Dave Shine :
> We see this quite a bit. There is a JIRA (I don't remember the number) that
> addresses this issue. It has been a
We see this quite a bit. There is a JIRA (I don't remember the number) that
addresses this issue. It has been applied to the apache distro, but I don't
think it is incorporated in and release of CDH yet.
Dave Shine
Sr. Software Engineer
321.939.5093 direct | 407.314.0122 mobile
CI Boost™ Clie
Hi,
I ran a very simple rowcount (HBase) MR today, and one of the tasks
failed with the status 126. Seems that there is no logs at all on the
server side. Any idea what this mean?
Thanks,
JM
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Are you running this on the VM by any chance?
On Jan 10, 2013, at 9:11 PM, Mahesh Balija
mailto:balijamahesh@gmail.com>> wrote:
Hi,
2 reducers are successfully completed and 1498 have been killed. I
assume that you have the data issues. (Either the data is huge or some issue
Thanks Harsh,
I guess there would be no back port on this as because of differences
of MR2 and 1.
Regards,
P
On Thu, Jan 10, 2013 at 10:17 PM, Harsh J wrote:
> Hierarchal queues are a feature of YARN's CapacityScheduler, which
> isn't available in 1.x based releases/distributions such as CDH3u4.
Have a look on flume..
On Fri, Jan 11, 2013 at 11:58 PM, Xiaowei Li wrote:
> ct all log generated from
∞
Shashwat Shriparv
Hi,
is there a facility, or a log server, to collect all log generated from
each mapper/reducer of a MR job, in a centralized place, or unified log
file on HDFS?
now i look at logs in a naive way: go to jobtracker to check each task
thanks
-xw
Thanks Adam!
On Fri, Jan 11, 2013 at 12:15 PM, Adam Faris wrote:
> A patch was submitted for topology documentation, but it doesn't appear to
> have made it to any releases. This svn link may help starting at line 1294.
>
> http://svn.apache.org/viewvc?view=revision&revision=1411359
>
> Assumi
A patch was submitted for topology documentation, but it doesn't appear to have
made it to any releases. This svn link may help starting at line 1294.
http://svn.apache.org/viewvc?view=revision&revision=1411359
Assuming you are using hadoop 1.x and not yarn, the topology script only needs
t
The documentation on topology conf (topology.script.file.name) is a little
sparse, and while we have it working in our cluster I am trying to make it
a little easier to configure.
Currently we upload a python file and conf file to every node in our
cluster. However I have a feeling that it is onl
He's got two different queues.
1) queue in capacity scheduler so he can have a set or M/R tasks running in the
background to pull data off of...
2) a durable queue that receives the inbound json files to be processed.
You can have a customer written listener that pulls data from the queue and
Thanks!
On Fri, Jan 11, 2013 at 9:52 AM, Mohammad Tariq wrote:
> Hello ma'am,
>
> In order to unsubscribe, you need to send the mail at "
> user-unsubscr...@hadoop.apache.org".
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
>
>
> On Fri, Jan 11, 2013 at 8:18 PM, Indrani Gorti wrote
Hello ma'am,
In order to unsubscribe, you need to send the mail at "
user-unsubscr...@hadoop.apache.org".
Warm Regards,
Tariq
https://mtariq.jux.com/
On Fri, Jan 11, 2013 at 8:18 PM, Indrani Gorti wrote:
>
>
> --
> Indrani Gorti
>
--
Indrani Gorti
You can also use fluentd. http://fluentd.org/
"Fluentd receives logs as JSON streams, buffers them, and sends them
to other systems like Amazon S3, MongoDB, Hadoop, or other Fluentds."
It has a plugin for pushing into HDFS through fluent-plugin-webhdfs.
https://github.com/fluent/fluent-plugin-webhd
There is also kafka. http://kafka.apache.org
"A high-throughput, distributed, publish-subscribe messaging system."
But it does not push into HDFS, you need to launch a job to pull data in.
Regards
Bertrand
On Fri, Jan 11, 2013 at 1:52 PM, Mirko Kämpf wrote:
> I would suggest to work with Flum
I would suggest to work with Flume, in order to clollect a certain number
of files and store it to HDFS in larger chunk or write it directly to
HBase, this allows random access later on (if need) otherwise HBase could
be an overkill. You can collect data in an MySQL DB and than import
regularly via
http://my.safaribooksonline.com/book/databases/hadoop/9780596521974
I loved this book. very well defined
On Fri, Jan 11, 2013 at 3:22 AM, Michael Forage <
michael.for...@livenation.co.uk> wrote:
> I am still new but had similar questions and went through a lot of pain
> getting started
>
>
I am still new but had similar questions and went through a lot of pain getting
started
If you want to get programming rather than spend time learning how to install,
configure and administer the Hadoop tools I recommend using Amazon Elastic
MapReduce.
This will very quickly get you to a stage
On Fri, Jan 11, 2013 at 4:29 AM, John Lilley wrote:
> Where would we find some “big data” files that people have used for
> testing purposes?
Some of the most commonly used 'Big Data' files for testing are Global
Weather Data from NCDC (ftp://ftp.ncdc.noaa.gov/pub/data/gsod), Enron
emails, Airli
Hi all,
I'm working on an application that processes files in various formats,
and among others it tries to identify their official mime type name.
Sequence files are easily identifiable by their header (SEQn), but I'm
not sure what mime type name I should use - I can use whatever I like,
bu
Hi,
Warning, I am a newby myself. Please find my answer inline.
Good luck
Olivier
On 11 January 2013 10:29, John Lilley wrote:
> We are somewhat new to Hadoop and are looking to run some experiments
> with HDFS, Pig, and HBase.
>
> With that in mind, I have a few questions:
>
> What
I aslo looking for a good and easy learn doc.
On Fri, Jan 11, 2013 at 6:29 PM, John Lilley wrote:
> We are somewhat new to Hadoop and are looking to run some experiments
> with HDFS, Pig, and HBase.
>
> With that in mind, I have a few questions:
>
> What is the easiest (preferably fre
Queues in the capacity scheduler are logical data structures into which
MapReduce jobs are placed to be picked up by the JobTracker / Scheduler
framework, according to some capacity constraints that can be defined for a
queue.
So, given your use case, I don't think Capacity Scheduler is going to
d
We are somewhat new to Hadoop and are looking to run some experiments with
HDFS, Pig, and HBase.
With that in mind, I have a few questions:
What is the easiest (preferably free) Hadoop distro to get started with?
Cloudera?
What host OS distro/release is recommended?
What is the easiest environme
Hmm. Unfortunately, there is another config variable that may be affecting
this: keep.task.files.pattern
This is set to .* in the job.xml file you sent. I suspect this may be
causing a problem. Can you please remove this, assuming you have not set it
intentionally ?
Thanks
Hemanth
On Fri, Jan
30 matches
Mail list logo