-- I have been running Hadoop on a clister set to not check permissions. I
would run a java client on my local machine and would run as the local user
on the cluster.
I say
* String connectString = hdfs:// + host + : + port + /;*
*Configuration config = new Configuration();*
*
Hi Steve,
A normally-written client program would work normally on both
permissions and no-permissions clusters. There is no concept of a
password for users in Apache Hadoop as of yet, unless you're dealing
with a specific cluster that has custom-implemented it.
Setting a specific user is not
just one node not having memory does not mean your cluster is down.
Can you see your hdfs health on NN UI?
how much memory do you have on NN? if there are no jobs running on the
cluster then you can safely restart datanode and tasktracker.
Also run a top command and figure out which processes
I have a job that's getting 600s task timeouts during the copy phase of the
reduce step. I see a lot of copy tasks all moving at about 2.5MB/sec, and
it's taking longer than 10 min to do that copy.
The process starts copying when the reduce step is 80% complete. This is a
very IO bound task as
I can issue a command 'hadoop dfsadmin -report', but it did not return any
result for a long time. Also, I can open the NN UI(http://namenode:50070),
but it is always keeping in the connecting status, and could not return any
cluster statistic.
The mem of NN:
total used
4GB memory on NN? this will run out of memory in few days.
You will need to make sure your NN has atleast more than double RAM of your
DNs if you have a miniature cluster.
On Mon, May 13, 2013 at 11:52 AM, sam liu samliuhad...@gmail.com wrote:
I can issue a command 'hadoop dfsadmin -report',
Hi,
I have a 3-node cluster, with JobTracker running on one machine and
TaskTrackers on other two. Instead of using HDFS, I have written my own
FileSystem implementation. As an experiment, I kept 1000 text files (all of
same size) on both the slave nodes and ran a simple Wordcount MR job. It
For control number of mappers question: You can use
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/CombineFileInputFormat.html
which is designed to solve similar cases. However, you cannot beat the
speed you get out of a single large file (or a few large files), as
you'll
Shashwat,
Tweaking the split sizes affects a single input split, not how the
splits are packed. It may be used with the CombineFileInputFormat to
control packed split sizes, but would otherwise not be of use in
merging processing of several blocks across files into the same map
task.
On Mon, May
Hi,
@Harsh: Thanks for the reply. Would the patch work in Hadoop 1.0.4 release?
-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: Monday, May 13, 2013 1:03 PM
To: user@hadoop.apache.org
Subject: Re: How to combine input files for a MapReduce job
For control number of
Yes I believe the branch-1 patch attached there should apply cleanly to 1.0.4.
On Mon, May 13, 2013 at 1:25 PM, Agarwal, Nikhil
nikhil.agar...@netapp.com wrote:
Hi,
@Harsh: Thanks for the reply. Would the patch work in Hadoop 1.0.4 release?
-Original Message-
From: Harsh J
Due to Small amount of memory available to the nodes they are not able to
send response in time, and socket connection exception, and there may be
some network issue to.
Please check which program is using memory? as there will be some other
cohosted application eating up the memory.
ps -e
HI
I adopt distributed cache to implement the semi-joins.
I am using CDH4.1.2
the Map side setup function is as [1]:
It works well in my eclipse indigo, howevet it goes wrong when I run it in
cli:
The exeception in one of the containers refers to[2].
How could I solve this exception?
Hi Harsh,
I applied the changes of the patch in hadoop source code but can you please
tell exactly where is this log being printed? I checked in log files of
JobTracker and TaskTracker but it is not there. It is not getting printed in
_logs folder creates inside output directory for MR job.
Hi,
I got it. The log info is printed in userlogs folder in slave nodes, in the
file syslog.
Thanks,
Nikhil
-Original Message-
From: Agarwal, Nikhil
Sent: Monday, May 13, 2013 4:10 PM
To: 'user@hadoop.apache.org'
Subject: RE: How to combine input files for a MapReduce job
Hi Harsh,
Can I use the FairScheduler to limit the number of map/reduce tasks directly
from the job configuration? E.g. I have 1 job that I know should run a more
limited # of map/reduce tasks than is set as the default, I want to
configure a queue with a limited # of map/reduce tasks, but only apply it to
Hi
I fixed the problem just add the expression of job.setJar(MyJarName)
,and the job went well.
But I have no idea about the expression and the exception.
Any suggestion will be appreciated.
regards.
2013/5/13 YouPeng Yang yypvsxf19870...@gmail.com
HI
I adopt distributed cache to
Using fair scheduler or capacity scheduler, you are creating a queue that is
being applied to the cluster.
Having said that, you can limit who uses the special queue as well as specify
the queue at the start of you job as a command line option.
HTH
Sent from a remote device. Please excuse
That doesn't make sense...
Try introducing a combiner step.
Sent from a remote device. Please excuse any typos...
Mike Segel
On May 13, 2013, at 3:30 AM, shashwat shriparv dwivedishash...@gmail.com
wrote:
On Mon, May 13, 2013 at 11:35 AM, David Parks davidpark...@yahoo.com wrote:
(I’ve
Hi All
How do I use OpenCL(for GPU compute) with Hadoop ?
It would be great if someone could share a sample code.
Thanks Regards
Rohit Sarewar
Hi All
My data set resides in HDFS. I need to compute 5 metrics, among which 2 are
compute intensive. So I want to compute those 2 metrics on GPU using OpenCL
and the rest 3 metrics using java map reduce code on Hadoop.
How can I pass data from HDFS to GPU ? or How can my opencl code access
data
Hadoop just runs as a standard java process, you should find something that
bridges between OpenCL and java, a quick google search yields:
http://www.jocl.org/
I expect that you'll find everything you need to accomplish the handoff from
your mapreduce code to OpenCL there.
As for HDFS,
You do not root if you want to install everything in your home directory
and assuming sun jdk is installed
On May 13, 2013 8:13 PM, Raj Hadoop hadoop...@yahoo.com wrote:
Hi,
I am planning to install Hadoop on Linux in a Pseudo Distributed Mode (
One Machine ). Do I require 'root' privileges
Hello Raj,
Install in what sense?Are you planning to use Apache's package?If that
is the case you just have to download and unzip it. And you don't need root
privilege for that.Or something else like CDH?I'm sorry, I didn't quite get
the question.
Warm Regards,
Tariq
cloudfront.blogspot.com
I am thinking to install both CDH and Apache version. So are you saying if i
install CDH - do i require root privielges?
From: Mohammad Tariq donta...@gmail.com
To: user@hadoop.apache.org user@hadoop.apache.org; Raj Hadoop
hadoop...@yahoo.com
Sent: Monday,
if you want to install CDH, then you will need root access as it needs to
install RPMs
for apache downloads, its not needed
On Mon, May 13, 2013 at 8:25 PM, Raj Hadoop hadoop...@yahoo.com wrote:
I am thinking to install both CDH and Apache version. So are you saying if
i install CDH - do i
Thanks for the suggestions. I ended up switching to jdk 1.7+ just to make
the code more readable. I will take a look at the EWAH implementation as
well.
Jim
On Sun, May 12, 2013 at 3:40 PM, Bertrand Dechoux decho...@gmail.comwrote:
You can disregard my links as their are only valid for java
So for CDH, while installing - how do i request my Unix admin ? Any tips.
I am requesting a separate user on Linux box. So how do that user ( what kind
of privileges ) are required for the new user? and do these new user need to
have some kind of temporary root access. How does this work
If you are installing CDH version of hadoop tell your admi that you need
root access as yoiu need to install RPM :)
*Thanks Regards*
∞
Shashwat Shriparv
Raj,
Apache Hadoop by itself does not require root privileges to run
(assuming a non-secure setup). You can run it out of a tarball from a
home directory you have on the server machines.
However, many prefer using packages, such as those from Apache
Bigtop/etc., to install Hadoop and use it.
Hello,
What is the bets way to get the count of records in an HDFS file generated
by a PIG script.
Thanks
It is a text file.
If we want to use wc, we need to copy file from HDFS and then use wc, and
this may take time. Is there a way without copying file from HDFS to local
directory?
Thanks
On Mon, May 13, 2013 at 11:04 AM, Rahul Bhattacharjee
rahul.rec@gmail.com wrote:
few pointers.
what
How about the second approach , get the application/job id which the pig
creates and submits to cluster and then find the job output counter for
that job from the JT.
Thanks,
Rahul
On Mon, May 13, 2013 at 11:37 PM, Mix Nin pig.mi...@gmail.com wrote:
It is a text file.
If we want to use wc,
Any pointer to my question.
There is another question , kind-of dumb , but just wanted to clarify.
Say in a FIFO scheduler or a capacity scheduler , if there are slots
available and the first job doesn't need all of the available slots , then
the job next in the queue is scheduled for execution
If it is just counting the no. of records in a file then how about having a
short 3 liner :
LOGS= LOAD 'log';
LOGS_GROUP= GROUP LOGS ALL;
LOG_COUNT = FOREACH LOGS_GROUP GENERATE COUNT(LOGS);
It did the trick for me.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Mon, May 13, 2013 at 11:57 PM,
Hi,
On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee
rahul.rec@gmail.com wrote:
Hi,
I was going through the job schedulers of Hadoop and could not see any major
operational difference between the capacity scheduler and the fair share
scheduler apart from the fact that fair share
Hi,
The final count file should reside in local directory, but not in HDFS
directory. The above scripts will store text file in HDFS directory.
The count file would need to be sent to other team who do not work on HDFS.
Thanks
On Mon, May 13, 2013 at 11:36 AM, Mohammad Tariq
Agree with Shahab.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Tue, May 14, 2013 at 12:32 AM, Shahab Yunus shahab.yu...@gmail.comwrote:
The count file will be a very small file, right? Once it is generated on
HDFS, you can automate its downloading or movement anywhere you want. This
Hi,
As the name suggest, Fair-scheduler does a fair allocation of slot to the
jobs.
Let say, you have 10 map slots in your cluster and it is occupied by a
job-1 which requires 30 map slot to finish. But the same time, another
job-2 require only 2 map slots to finish - Here slots will be provided
Hi,
Can anyone suggest how to configure Eclipse on Mac for Hadoop? Hadoop is
running like a Pseudo-distributed moded. Please provide any reference articles
or other best practices that need to be followed in this case.
Thanks,
Raj
Hello Raj,
I am a Linux addict, but the procedure should be same for Mac as
well. You need to pull the hadoop-eclipse plugin, build it keeping all the
dependencies in mind and copy it into the Eclipse plugin directory. Restart
the Eclipse and you should be good to go. For a detailed info
Thanks a lot for the replies , it was really helpful.
On Tue, May 14, 2013 at 1:02 AM, Alok Kumar alok...@gmail.com wrote:
Hi,
As the name suggest, Fair-scheduler does a fair allocation of slot to the
jobs.
Let say, you have 10 map slots in your cluster and it is occupied by a
job-1 which
42 matches
Mail list logo