If your task is running out of memory, you could add the option *
-XX:+HeapDumpOnOutOfMemoryError *
*to *mapred.child.java.opts (along with the heap memory). However, I am not
sure where it stores the dump.. You might need to experiment a little on
it.. Will try and send out the info if I get
MR will partition and sort inputs by keys by the key comparator, and
then group them together when reading back via a grouping comparator
(which is usually the same as the key comparator). It will not re-sort
the values nor look at any of the value's fields during this process.
If you want your
Problem solved. Thank you.
2013/3/26 Harsh J ha...@cloudera.com
YARN does not seem to be checking for a fully qualified path when you
pass it yours and ends up breaking. The problem is easily reproducible
with the two transforming calls from ConverterUtils.
Transform the jarPath to a fully
Hi,
I have a Hadoop cluster up and running. I want to submit an MR job to it but
the input data is kept on an external server (outside the hadoop cluster). Can
anyone please suggest how do I tell my hadoop cluster to load the input data
from the external servers and then do a MR on it ?
you are looking at a two step workflow here
first unit of your workflow will download the file from external server and
write it to DFS and return the file path
second unit of your workflow will read the input path and process the data
according to your business logic in MR
you can look at
Harsh, thanks.
On Tue, Mar 26, 2013 at 2:28 PM, Harsh J ha...@cloudera.com wrote:
MR will partition and sort inputs by keys by the key comparator, and
then group them together when reading back via a grouping comparator
(which is usually the same as the key comparator). It will not re-sort
Hi,
I have some YARN application written and running properly against
hadoop-2.0.0-alpha but when I recently downloaded and started using
hadoop-2.0.3-alpha it doesn't work. I think the original one I wrote was
looking at the Client.java and ApplicationMaster.java in DistributedShell
example.
Hi,
Thanks for your reply. I do not know about cascading. Should I google it as
cascading in hadoop? Also, what I was thinking is to implement a file system
which overrides the functions provided by fs.FileSystem interface in Hadoop. I
tried to write some portions of the filesystem (for my
Make sure you have the topology script available on the JobTracker server
as well. This also requires a jobtracker stop/start to take effect.
Also, make sure $HADOOP_CONF resolves properly as the mapred user.
On Tue, Mar 26, 2013 at 1:19 AM, preethi ganeshan
preethiganesha...@gmail.com wrote:
can you addInputPath(hdfs://……),dont change fs.default.name, It cannot
solve your problem.
On Mar 26, 2013 7:03 PM, Agarwal, Nikhil nikhil.agar...@netapp.com
wrote:
Hi,
Thanks for your reply. I do not know about cascading. Should I google it
as “cascading in hadoop”? Also, what I was
unsubscribe
and your hadoop version.
On Mar 26, 2013 1:28 PM, Mohammad Tariq donta...@gmail.com wrote:
Hello Sagar,
It would be helpful if you could share your logs with us.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Tue, Mar 26, 2013 at 10:47 AM, Sagar
Hi,
I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like I
suspected, the dump goes to the current work directory of the task attempt
as it executes on the cluster. This directory is cleaned up once the task
is done. There are options to keep failed task files or task files
Create a dump.sh on hdfs.
$ hadoop dfs -cat /user/knoguchi/dump.sh
#!/bin/sh
hadoop dfs -put myheapdump.hprof /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
Run your job with
-Dmapred.create.symlink=yes
-Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
Hello Mac users!
We are happy to announce that Pydoop (http://pydoop.sourceforge.net) has
been included in the Homebrew Python Tap. You should now be able to
install on Mac OS X Mountain Lion as follows:
1. Manually install the Oracle JDK
2. Set JAVA_HOME according to your JDK installation,
Hi JM ,
Actually these dirs need to be purged by a script that keeps the last 2 days
worth of files, Otherwise you may run into # of open files exceeds error.
Thanks
On Mar 25, 2013, at 5:16 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org
wrote:
Hi,
Each time my MR job is run, a
You can control the limit of these cache files, the default is 10GB (value of
10737418240L): Try changing local.cache.size or
mapreduce.tasktracker.cache.local.size in mapred-site.xml
Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/
On Mar 25, 2013, at 5:16 PM,
All the files are not opened at the same time ever, so you shouldn't see any #
of open files exceeds error.
Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/
On Mar 26, 2013, at 12:53 PM, Abdelrhman Shettia wrote:
Hi JM ,
Actually these dirs need to be purged by a
There is no BackupNode in Hadoop 1.
That was a bug in documentation.
Here is the updated link:
http://hadoop.apache.org/docs/r1.1.2/hdfs_user_guide.html
Thanks,
--Konstantin
On Sat, Mar 23, 2013 at 12:04 AM, varun kumar varun@gmail.com wrote:
Hope below link will be useful..
Let me clarify , If there are lots of files or directories up to 32K (
Depending on the user's # of files sys os config) in
those distributed cache dirs, The OS will not be able to create any more
files/dirs, Thus M-R jobs wont get initiated on those tasktracker machines.
Hope this helps.
Thanks
For the situation I faced I was really a disk space issue, not related
to the number of files. It was writing on a small partition.
I will try with local.cache.size or
mapreduce.tasktracker.cache.local.size to see if I can keep the final
total size under 5GB... Else, I will go for a customed
Thanks for the update, I understand now that I'll be installing a secondary
name node which performs checkpoints on the primary name node and keeps a
working backup copy of the fsimage file.
The primary name node should write its fsimage file to at least 2 different
physical mediums for improved
Koji,
Works beautifully. Thanks a lot. I learnt at least 3 different things with
your script today !
Hemanth
On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi knogu...@yahoo-inc.comwrote:
Create a dump.sh on hdfs.
$ hadoop dfs -cat /user/knoguchi/dump.sh
#!/bin/sh
hadoop dfs -put
yes,you got it. hadoop1.0.x cannot failover auto or mannual. you have to
copy fsimage from SNN to the primary NN.
On Mar 27, 2013 11:29 AM, David Parks davidpark...@yahoo.com wrote:
Thanks for the update, I understand now that I'll be installing a
secondary
name node which performs checkpoints
The stack trace indicates the job client is trying to submit a job to the
MR cluster and it is failing. Are you certain that at the time of
submitting the job, the JobTracker is running ? (On localhost:54312) ?
Regarding using a different file system - it depends a lot on what file
system you are
David,
If the good copy on NFS exists post-crash of the NN, use that for
lesser/zero loss than the SNN which can be an hour old (checkpoint
period) by default. Thats the whole point for running the NFS disk
mount (make sure its softmounted btw, you don't want your NN to hang
if the NFS is hung).
26 matches
Mail list logo