Hi all,
Are the slides or videos of the talks given at Hadoop Summit available
online? I checked the Yahoo! website for the summit but could not find
any links.
Regards,
--
Jaideep
http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html
-- It sounds like Pregel seems, a computing framework based on dynamic
programming for the graph operations. I guess maybe they removed the
file communications/intermediate files during iterations.
Anyway, What
The address of the JobTracker (NameNode) is specified using *
mapred.job.tracker* (*fs.default.name*) in the configurations. When the
JobTracker (NameNode) starts, it will listen on the address specified by *
mapred.job.tracker* (*fs.default.name*); and when a TaskTracker (DataNode)
starts, it
Is there any way we can submit a mapreduce job from another map job? The
requirement is:
I have customers with start date and end date as follows:
CustomerID Start Date End Date
Xxx mm/dd/yymm/dd/yy
YYY mm/dd/yymm/dd/yy
ZZZ
Is this before 0.20.0? Assuming you have closed these streams, it is
mostly https://issues.apache.org/jira/browse/HADOOP-4346
It is the JDK internal implementation that depends on GC to free up its
cache of selectors. HADOOP-4346 avoids this by using hadoop's own cache.
Raghu.
Stas Oskin
Hi ,
Are there any tools which can measure the run-time of the map-reduce jobs ??
any help is appreciated .
Thanks in advance
Hi.
I've started doing just that, and indeed the amount of fd's of the DataNode
process have reduced significantly.
My problem is that my own app, which works with DFS, still have dozens of
pipes and epolls open.
The usual level seems to be about 300-400 fd's, but when I access the DFS
for
Andrew Wharton wrote:
https://issues.apache.org/jira/browse/HADOOP-4539
I am curious about the state of this fix. It is listed as
Incompatible, but is resolved and committed (according to the
comments). Is the backup name node going to make it into 0.21? Will it
remove the SPOF for HDFS? And if
jason hadoop wrote:
Yes.
Otherwise the file descriptors will flow away like water.
I also strongly suggest having at least 64k file descriptors as the open
file limit.
On Sun, Jun 21, 2009 at 12:43 PM, Stas Oskin stas.os...@gmail.com wrote:
Hi.
Thanks for the advice. So you advice explicitly
Scott Carey wrote:
Furthermore, if for some reason it is required to dispose of any objects after
others are GC'd, weak references and a weak reference queue will perform
significantly better in throughput and latency - orders of magnitude better -
than finalizers.
Good point.
I would
Raghu Angadi wrote:
Is this before 0.20.0? Assuming you have closed these streams, it is
mostly https://issues.apache.org/jira/browse/HADOOP-4346
It is the JDK internal implementation that depends on GC to free up its
cache of selectors. HADOOP-4346 avoids this by using hadoop's own cache.
Hi,
I am very new to Hadoop, I have few basic questions...
How and where do I need to specify which all system in the given cluster to
be used as DataNodes ?
Can I change this set dynamically ?
Hi Group,
I was having trouble getting through an example Hadoop program. I have
searched the mailing list but could not find any thing useful. Below is the
issue:
1) Executed below command to submit a job to Hadoop:
/hadoop-0.18.3/bin/hadoop jar -libjars AggregateWordCount.jar
It cannot find your job jar file. Make sure you run this command from the
directory that has the AggregateWordCount.jar (and you can lose the -libjars
flag too - you need that only if you need to specify extra jar dependencies
apart from your job jar file).
- Harish
On Mon, Jun 22, 2009 at 3:45
Hi,
Thanks for your quickly reponses.
I tried to relax this limit to 204800, but it still not work.
Is this possible caused from fs objects?
Anyway, thanks a lot!
2009/6/22 zhuweimin xim-...@tsm.kddilabs.jp
Hi
The max open files have limit in LINUX box. Please using ulimit to view and
Thanks for your reply Harish.
Am running this example from with in the directory containing
AggregateWordCount.jar file. But even then, I have this issue. Earlier I had
issue of java.lang.ClassNotFoundException:
org.apache.hadoop.examples.AggregateWordCount$WordCountPlugInClass, so in
some
Can you attach the jar file you have?
-Ram
-Original Message-
From: Shravan Mahankali [mailto:shravan.mahank...@catalytic.com]
Sent: Monday, June 22, 2009 3:49 AM
To: 'Harish Mallipeddi'; core-user@hadoop.apache.org
Subject: RE: java.io.IOException: Error opening job jar
Thanks for
Hi.
So what would be the recommended approach to pre-0.20.x series?
To insure each file is used only by one thread, and then it safe to close
the handle in that thread?
Regards.
2009/6/22 Steve Loughran ste...@apache.org
Raghu Angadi wrote:
Is this before 0.20.0? Assuming you have closed
If the BackupNode doesn't promise HA, then how would additional testing on this
feature aid in the HA story? Maybe you could expand on the purpose of
HADOOP-4539 because now I'm confused.
How does the approaching 0.21 cutoff translate into a release date for 0.21?
-brian
-Original
64k might help in the sense, you might hit GC before you hit the limit.
Otherwise, your only options are to use the patch attached to
HADOOP-4346 or run System.gc() occasionally.
I think it should be committed to 0.18.4
Raghu.
Stas Oskin wrote:
Hi.
Yes, it happens with 0.18.3.
I'm
Hi all,
How does one handle a mount running out of space for HDFS? We have two
disks mounted on /mnt and /mnt2 respectively on one of the machines that are
used for HDFS, and /mnt is at 99% while /mnt2 is at 30%. Is there a way to
tell the machine to balance itself out? I know for the
There's no file attached Shravan.
Regards
Ram
-Original Message-
From: Shravan Mahankali [mailto:shravan.mahank...@catalytic.com]
Sent: Monday, June 22, 2009 4:43 AM
To: core-user@hadoop.apache.org; 'Harish Mallipeddi'
Subject: RE: java.io.IOException: Error opening job jar
Hi
Hi,
Over at Mahout (http://lucene.apache.org/mahout) we have a Vector
interface with two implementations DenseVector and SparseVector. When
it comes to writing Mapper/Reducer, we have been able to just use
Vector, but when it comes to actually binding real data via a
Configuration, we
On Thu, Jun 18, 2009 at 01:36:14PM -0700, Owen O'Malley wrote:
On Jun 18, 2009, at 10:56 AM, pmg wrote:
Each line from FileA gets compared with every line from FileB1,
FileB2 etc.
etc. FileB1, FileB2 etc. are in a different input directory
In the general case, I'd define an InputFormat
Hi Rahid.
A question - this issue does not influence Hadoop itself (DataNodes,
etc...), but rather influence any application using DFS, correct?
If so, without patching iI should either to increase fd limit (which might
fill-up as well???), or periodically launch the GC?
Regards.
2009/6/22
Yes, If your job gets completed successfully .possibly it removes after
completion of both map and reduce tasks.
Pankil
On Mon, Jun 22, 2009 at 3:15 PM, Qin Gao q...@cs.cmu.edu wrote:
Hi All,
Do you know if the tmp directory on every map/reduce task will be deleted
automatically after the
Thanks!
But what if the jobs get killed or failed? Does hadoop try to clean it? we
are considering bad situations - if job gets killed, will the tmp dirs sit
on local disks forever and eats up all the diskspace?
I guess this should be considered in distributed cache, but those files are
The Cloudera talks are here:
http://www.cloudera.com/blog/2009/06/22/a-great-week-for-hadoop-summit-west-roundup/
As for the rest, I'm not sure.
Alex
On Sun, Jun 21, 2009 at 11:46 PM, jaideep dhok jdd...@gmail.com wrote:
Hi all,
Are the slides or videos of the talks given at Hadoop Summit
What specific information are you interested in?
The job history logs show all sorts of great information (look in the
history sub directory of the JobTracker node's log directory).
Alex
On Mon, Jun 22, 2009 at 1:23 AM, bharath vissapragada
bhara...@students.iiit.ac.in wrote:
Hi ,
Are
I am not sure but sometimes you might see that datanodes are working from
cmd prompt..
But actually when you look at the logs you find sme kind of error in
that..Check the logs of datanode..
Pankil
On Wed, Jun 17, 2009 at 1:42 AM, ashish pareek pareek...@gmail.com wrote:
Hi,
When I run
hi Stu,
which block conversion are you talking about? If you are talking abt block
size of data then it remains same in upgrade unless and until you change it.
Pankil
On Tue, Jun 16, 2009 at 5:16 PM, Stu Hood stuart.h...@rackspace.com wrote:
Hey gang,
We're preparing to upgrade our cluster
Are you seeing any exceptions because of the disk being at 99% capacity?
Hadoop should do something sane here and write new data to the disk with
more capacity. That said, it is ideal to be balanced. As far as I know,
there is no way to balance an individual DataNode's hard drives (Hadoop does
Hey Alex,
Will Hadoop balancer utility work in this case?
Pankil
On Mon, Jun 22, 2009 at 4:30 PM, Alex Loddengaard a...@cloudera.com wrote:
Are you seeing any exceptions because of the disk being at 99% capacity?
Hadoop should do something sane here and write new data to the disk with
more
No..If your job gets killed or failed.Temp wont clean up.. and In that case
you will have to carefully clean that on your own. If you dont clean it up
yourself it will eat up your disk space.
Pankil
On Mon, Jun 22, 2009 at 4:24 PM, Qin Gao q...@cs.cmu.edu wrote:
Thanks!
But what if the jobs
Pankil-
I'd be interested to know the size of the /mnt and /mnt2 partitions.
Are they the same? Can you run the following and report the output...
% df -h /mnt /mnt2
Thanks.
-Matt
On Jun 22, 2009, at 1:32 PM, Pankil Doshi wrote:
Hey Alex,
Will Hadoop balancer utility work in this
On 6/22/09 10:12 AM, Kris Jirapinyo kjirapi...@biz360.com wrote:
Hi all,
How does one handle a mount running out of space for HDFS? We have two
disks mounted on /mnt and /mnt2 respectively on one of the machines that are
used for HDFS, and /mnt is at 99% while /mnt2 is at 30%. Is
Stas Oskin wrote:
Hi.
So what would be the recommended approach to pre-0.20.x series?
To insure each file is used only by one thread, and then it safe to close
the handle in that thread?
Regards.
good question -I'm not sure. For anythiong you get with
FileSystem.get(), its now dangerous to
The initial overhead is fairly small (extra hard link for each file).
After that, the overhead grows as you delete the files (thus its blocks)
that existed before the upgrade.. since the physical files for blocks
are deleted only after you finalize.
So the overhead == (the blocks that got
I have used the balancer to balance the data in the cluster with the
-threshold option. The bandwidth transfer was set to 1MB/sec ( I think
thats the default setting) in one of the config files and had to move
500GB of data around. It did take sometime but eventually the data got
spread
On 6/22/09 12:15 PM, Qin Gao q...@cs.cmu.edu wrote:
Do you know if the tmp directory on every map/reduce task will be deleted
automatically after the map task finishes or will do I have to delete them?
I mean the tmp directory that automatically created by on current directory.
Past
Thanks, then I will try keep a log on the files and clean them out, thanks.
--Q
On Mon, Jun 22, 2009 at 4:34 PM, Pankil Doshi forpan...@gmail.com wrote:
No..If your job gets killed or failed.Temp wont clean up.. and In that case
you will have to carefully clean that on your own. If you dont
Matt.
Kris can give that info..
I am one of the users from mailing list.
PAnkil
On Mon, Jun 22, 2009 at 4:37 PM, Matt Massie m...@cloudera.com wrote:
Pankil-
I'd be interested to know the size of the /mnt and /mnt2 partitions. Are
they the same? Can you run the following and report the
Ok, seems this issue is already patched in the Hadoop distro I'm using
(Cloudera).
Any idea if I still should call GC manually/periodically to clean out all
the stale pipes / epolls?
2009/6/22 Steve Loughran ste...@apache.org
Stas Oskin wrote:
Hi.
So what would be the recommended approach
It's a typical Amazon EC2 Large instance, so 414G each.
-- Kris.
On Mon, Jun 22, 2009 at 1:37 PM, Matt Massie m...@cloudera.com wrote:
Pankil-
I'd be interested to know the size of the /mnt and /mnt2 partitions. Are
they the same? Can you run the following and report the output...
% df
Hello Pratik, -joblog also should be a specific job history file path not a
directory. Usually, I copy the job conf xml file and job history log file to
a local file system and then use a file:// protocol (although hdfs:// should
also work) e.g,
Sh
Hey all, just a friendly reminder that this is Wednesday! I hope to see
everyone there again. Please let me know if there's something interesting
you'd like to talk about -- I'll help however I can. You don't even need a
Powerpoint presentation -- there's many whiteboards. I'll try to have a
video
Hi All!
I have been running Hadoop jobs through my user account on a cluster, for a
while now. But now I am getting this strange exception when I try to execute
a job. If anyone knows, please let me know why this is happening.
[akhil1...@altocumulus WordCount]$ hadoop jar
Hi All:
Is there any way using Hadoop Streaming to determining the directory from which
an input record is being read? This is straightforward in Hadoop using
InputFormats, but I am curious if the same concept can be applied to streaming.
The goal here is to read in data from 2 directories, say
The directory specified by the configuration parameter mapred.system.dir,
defaulting to /tmp/hadoop/mapred/system, doesn't exist.
Most likely your tmp cleaner task has removed it, and I am guessing it is
only created at cluster start time.
On Mon, Jun 22, 2009 at 6:19 PM, akhil1988
configure and close are run for each task, mapper and reducer. The configure
and close are NOT run on the combiner class.
On Mon, Jun 22, 2009 at 9:23 AM, Saptarshi Guha saptarshi.g...@gmail.comwrote:
Hello,
In a mapreduce job, a given map JVM will run N map tasks. Are the
configure and close
Check the process environment for your streaming tasks, generally the
configuration variables are exported into the process environment.
The Mapper input file is normally stored as some variant of
mapred.input.file. The reducer's input is the mapper output for that reduce,
so the input file is
Hi Ramakishore,
Unable to attach files to mailing list! I hope Harish received the attached
docs to his gmail a/c.
PFA attached those here.
Any help would be appreciated.
Thank You,
Shravan Kumar. M
Catalytic Software Ltd. [SEI-CMMI Level 5 Company]
-
This email
Thanks for the link.
-
JD
On Tue, Jun 23, 2009 at 1:55 AM, Alex Loddengaarda...@cloudera.com wrote:
The Cloudera talks are here:
http://www.cloudera.com/blog/2009/06/22/a-great-week-for-hadoop-summit-west-roundup/
As for the rest, I'm not sure.
Alex
On Sun, Jun 21, 2009 at 11:46 PM,
53 matches
Mail list logo