Hi Saurabh,
It's an interesting topic,
>> So , here is the question , is it possible to make sure that when one of
the mapper tasks is writing to a file , other should wait until the first
one is finished. ? I read that all the mappers task don't interact with
each other
A simple way to do this i
Hi Krishna,
To get more understanding about the problem, could you please share
following information:
1) Number of nodes and running app in the cluster
2) What's the version of your Hadoop?
3) Have you set
"yarn.scheduler.capacity.schedule-asynchronously.enable"=true?
4) What's the "yarn.resourcem
By default, vcore = 1 for each resource request. If you don't like this
behavior, you can set yarn.scheduler.minimum-allocation-vcores=0
Hope this helps,
Wangda Tan
On Thu, Aug 7, 2014 at 7:13 PM, Krishna Kishore Bonagiri <
write2kish...@gmail.com> wrote:
> Hi,
> I am calling getAvailableRes
Hi Experts,
I am very interesting that Hadoop could work with Docker and doing some
trial on patch of YARN-1964.
I applied patch yarn-1964-branch-2.2.0-docker.patch of jira YARN-1964 on
branch 2.2 and am going to install a Hadoop cluster using the new generated
tarball including the patch.
Then,
Hi,
I ran a job in Hive and it got to this stage: Stage-1 map = 100%, reduce =
29%, seemed to start cleaning up the containers and stuff successfully, and
then I got this series of errors:
2014-08-12 03:58:55,718 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
a
Hi
I see two different approach for deleting localized resources and
aggregated logs.
1. Localized resources are deleted based on the size of localizer cache,
per local directory.
2. Aggregated logs are deleted based on the time(if enabled).
Is there any specific thoughts f
Can Setting up 2 datanodes on same machine be considered as
pseudo-distributed mode hadoop ?
Thanks,
Sindhu
Yes :)
Pseudo-distributed mode is such configuration when we have some Hadoop
environment on single computer.
On 12/08/14 18:25, sindhu hosamane wrote:
> Can Setting up 2 datanodes on same machine be considered as
> pseudo-distributed mode hadoop ?
>
> Thanks,
> Sindhu
signature.asc
Descrip
I have read "By default, Hadoop is configured to run in a non-distributed
mode, as a single Java process" .
But if my hadoop is pseudo distributed mode , why does it still runs as a
single Java process and utilizes only 1 cpu core even if there are many
more ?
On Tue, Aug 12, 2014 at 4:32 PM, Se
+ Hive user mailing list
It should be a better place for your questions.
On Mon, Aug 11, 2014 at 3:17 PM, Ana Gillan wrote:
> Hi,
>
> I’ve been reading a lot of posts about needing to set a high ulimit for
> file descriptors in Hadoop and I think it’s probably the cause of a lot of
> the error
Hi Users,
In my cluster setup i am doing one test case of making only all
datanodes down and keep namenode running.
In this case my application gets error with remoteException:
could only be replicated to 0 nodes instead of minReplication (=1).
There are 0 datanode(s) running and no node(
Hi Zhijie,
ulimit is common between hard and soft ulimit
The hard limit can only be set by a sys admin. It can be used for a fork
bomb dos attack.
The sys admin hard ulimit can be set per user i.e hadoop_user
A user can add a line to their .profile file setting a soft -ulimit up to
the hard limi
Hi all,
I've instantiated a Hadoop 2.4.1 cluster and I've found that running
MapReduce applications will parallelize differently depending on what
kind of filesystem the input data is on.
Using HDFS, a MapReduce job will spawn enough containers to maximize
use of all available memory. For example
Hi,
I am running a single node hadoop cluster 2.4.1.
When I submit a MR job it logs a warning:
2014-08-12 21:38:22,173 WARN [main] org.apache.hadoop.util.NativeCodeLoader:
Unable to load native-hadoop library for your platform... using builtin-java
classes where applicable
The problem doesn’t c
Which distribution are you people using? Cloudera vs Hortonworks vs
Biginsights?
Can someone explain what makes the above variable different ? Most of the time
they are set pointing to
the same directory.
Thanks
Reyane OUKPEDJO
Hortonworks. Here is my reasoning:
1. Hortonwork is 100% open source.
2. MapR has stuff on their roadmap that Hortonworks has already accomplished
and has moved on to other things.
3. Cloudera has proprietary stuff in their stack. No.
4. Hortonworks makes training super accessible and there is a c
3. seems a biased and incomplete statement.
Cloudera’s distribution CDH is fully open source. The proprietary „stuff" you
refer to is most likely Cloudera Manager, an additional tool to make
deployment, configuration and monitoring easy.
Nobody is required to use it to run a Hadoop cluster.
Ka
You fell into my trap sir. I was hoping someone would clear that up. :)
Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData
From: Kai Voigt
Sent: Tuesday, August 12, 2014 4:10 PM
To: user@hadoop.apache.org
Subject:
On that note, 2 is also misleading/incomplete. You might want to explain
which specific features you are referencing so the original poster can
figure out if those features are relevant. The inverse of 2 is also true,
things like consistent snapshots and full random read/write over NFS are in
Map
also, consider apache bigtop. That is the apache upstream Hadoop initiative,
and it comes with smoke tests+ Puppet recipes for setting up your own Hadoop
distro from scratch.
IMHO ... If learning or building your own tooling around Hadoop , bigtop is
ideal. If interested in purchasing support
Is this up to date?
http://www.mapr.com/products/product-overview/overview
Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData
From: Aaron Eng
Sent: Tuesday, August 12, 2014 4:31 PM
To: user@hadoop.apache.org
Subj
Hey, Arthur:
Could you show me the error message for rm2. please ?
Thanks
Xuan Gong
On Mon, Aug 11, 2014 at 10:17 PM, arthur.hk.c...@gmail.com <
arthur.hk.c...@gmail.com> wrote:
> Hi,
>
> Thank y very much!
>
> At the moment if I run ./sbin/start-yarn.sh in rm1, the standby STANDBY
> Res
Hi,
I deployed Hadoop 2.4 on AWS EC2 using S3 native file system as a
replacement of HDFS. I tried several example apps, all gave me the
following stack tracing msgs (an older thread on Jul 24 hang there w/o
being resolved... So I attach the DEBUG info here...):
hadoop jar share/hadoop/mapreduce/
Hi Everyone,
I'm using Hadoop-2.2.0 with fair scheduler in my YARN cluster, but something is
wrong with the fair scheduler.
Here is my fair-scheduler.xml looks like:
15360 mb, 5 vcores
0.5
2
5
1
I create a "longrun" queue to ensure that huge MR application can only
Hi Wangda ,
I am not sure making overwrite=false , will solve the problem. As per java
doc by making overwrite=false , it will throw an exception if the file
already exists. So, for all the remaining mappers it will throw an
exception.
Also I am very new to ZK and have very basic knowledge of it
Did you try to close the file and reopen it for writing after datanodes
restart ?
I think if you close the file and reopen it. The exception might disappear.
On Wed, Aug 13, 2014 at 2:21 AM, Satyam Singh
wrote:
> Hi Users,
>
>
>
> In my cluster setup i am doing one test case of making only all
Hi Saurabh,
>> am not sure making overwrite=false , will solve the problem. As per java
doc by making overwrite=false , it will throw an exception if the file
already exists. So, for all the remaining mappers it will throw an
exception.
You can catch the exception and wait.
>> Can you please refe
After applying this patch, I added following config in yarn-site.xml:
yarn.nodemanager.container-executor.class
org.apache.hadoop.yarn.server.nodemanager.DockerContainerExecutor
Then I can start NodeManager with enabling DockerContainerExecutor. But
failed to execute a simple mr job, an
Hi Henry,
Are there any applications (on different queues rather than longrun queue)
are running in the same time ? I think FairScheduler is going to assign
more resources to your "longrun" as long as there no other applications are
running in the other queues.
Thanks
Yehia
On 12 August 2014 20
This message I have also got when running in 2.4.1
I have found the native libraries in $HADOOP_HOME/lib/native are 32
bit not 64 bit.
Recompile once again and build 64 bit shared objects, but it is a
lengthy exercise.
On 8/13/14, Subroto Sanyal wrote:
> Hi,
>
> I am running a single node hadoo
Hi Yehia,
Oh? I thought that by using maxResources = 15360 mb (3072 mb * 5), vcores = 5,
and maxMaps = 5, I already restricting the job to only use 5 maps at max.
The reason is my long run job have 841 maps, and each map will process data for
almost 2 hours.
In the meantime there will be some s
Hi,
I'm trying to implement security on my hadoop data. I'm using Cloudera hadoop
Below are the two specific things I'm looking for
1. Role based authorization and authentication
2. Encryption on data residing in HDFS
I have looked into Kerboroes but it doesn't provide encryption for data alre
33 matches
Mail list logo