release (CDH2) we're now also running jdiff between the
stock Apache release and our own so as to verify the above guarantee.
-Todd
2009/9/7 Todd Lipcon t...@cloudera.com
Hi,
The EC2 scripts will boot Cloudera's distribution for Hadoop. Currently
they
boot our distribution
Hi Zheng,
The DistributedCache.addArchiveToClasspath call is the one that makes it get
unarchived into the temp directory. By contrast, addFileToClasspath doesn't.
I don't remember the old-style command line flag to trigger this call...
perhaps -archives or something?
Worth noting that -libjars
Hi Arvind,
Check the source code in DFSAdmin which handles dfsadmin -report. It uses
the same API that the namenode web UI does - I think it's called
getClusterStatus or something if my memory serves me correctly. Here's
example output on my pseudodistributed cluster:
Datanodes available: 1 (1
On Thu, Aug 13, 2009 at 12:04 AM, Manhee Jo j...@nttdocomo.com wrote:
Hi all,
I've succeeded in sharing hdfs files from windows xp through fuse-dfs then
samba mount.
When I tried to copy (read and write) 1GB text file from fuse-dfs over
samba, it took around 50 secs.
Then, I tried dfs get
On Thu, Aug 13, 2009 at 10:37 AM, Konstantin Shvachko s...@yahoo-inc.comwrote:
Steve,
There are other groups claimed they work on HA solution.
We had discussions about it not so long ago in this list.
Is it possible that your colleagues present their design?
As you point out the issue gets
Hi Mithila,
I assume you're referring to fair scheduler preemption. In the preemption
scenario, tasks are completely killed, not paused. It's not like a
preemptive scheduler in your OS where things are context switched. This is
why the preemption is not enabled by default and has tuning
.
Intermediate data from the big job will be on the local disk like it always
is - this isn't anything special about the fair scheduler. Map outputs
remain in mapred.local.dir until the job is complete.
-Todd
On Thu, Aug 13, 2009 at 10:52 AM, Todd Lipcon t...@cloudera.com wrote:
Hi Mithila,
I
On Thu, Aug 13, 2009 at 8:58 PM, Bogdan M. Maryniuk
bogdan.maryn...@gmail.com wrote:
Also make sure you
tuned TCP/IP stack, which is by default too conservative.
Any pointers on this? Would be interesting to see before/after tuning
benchmarks as well. Assuming this is a runtime tunable
Hi Mayuran,
Do you do all of your uploads of data into your Hadoop cluster from node001
and node002?
If so, keep in mind that one of your replicas will always be written on
localhost in the case that it is part of the cluster.
You should consider running the rebalancer to even up your space
things.
Thanks,
--Konstantin
Todd Lipcon wrote:
On Wed, Aug 12, 2009 at 3:42 AM, Stas Oskin stas.os...@gmail.com wrote:
Hi.
You can also use a utility like Linux-HA (aka heartbeat) to handle IP
address failover. It will even send gratuitous ARPs to make sure to get
the
new mac
Hey Stas,
You can also use a utility like Linux-HA (aka heartbeat) to handle IP
address failover. It will even send gratuitous ARPs to make sure to get the
new mac address registered after a failover. Check out this blog for info
about a setup like this:
BytesWritable serializes itself by first outputting the array length, and
then outputting the array itself. The 4 bytes at the top of the file are the
length of the value itself.
Hope that helps
-Todd
On Tue, Aug 11, 2009 at 6:33 PM, Kris Jirapinyo kjirapi...@biz360.comwrote:
Hi all,
I was
Hi Ryan,
Yes, you can do this -- the term is called interface bonding and isn't too
hard to set up in Linux as long as your switch supports it. However, it is
pretty rare that it provides an appreciable performance benefit on typical
hardware and workloads -- probably not worth the doubled switch
On Thu, Jul 30, 2009 at 11:39 AM, Scott Carey sc...@richrelevance.comwrote:
Use the deadline scheduler:
# echo 'deadline' /sys/block/sda/queue/scheduler(for each device)
Have you found the deadline scheduler to be significantly better than the
default cfq? I've used deadline for RDBMS
On Wed, Jul 29, 2009 at 8:51 AM, bhushan_mahale
bhushan_mah...@persistent.co.in wrote:
Hi,
What are the possible ways to retrieve the data if a node goes down in a
Hadoop cluster?
Assuming replication factor as 3, and 3 nodes goes down in a 10 node
cluster, how do we retrieve the data?
On Thu, Jul 23, 2009 at 11:56 AM, Ryan Smith ryan.justin.sm...@gmail.comwrote:
I was wondering if someone could give me some answers or maybe some
pointers
where to look in the code. All these questions are in the same vein of
hard
drive failure.
Question 1: If a master (system
Hi Andraz,
First, thanks for the contribution. Could you create a JIRA ticket and
upload the code there? Due to ASF restrictions, all contributions must be
attached to a JIRA so you can officially grant permission to include the
code. The JIRA will also allow others to review and comment on the
Hi Akhil,
Your mapred.local.dir is pointing to a directory which either does not have
permissions for the user running the daemon, or has been removed. Check that
configuration variable and make sure it's pointing to a directory that's
writable by the hadoop user.
-Todd
On Sat, Jul 18, 2009 at
.
-Todd
On Fri, Jul 17, 2009 at 12:54 AM, Todd Lipcon t...@cloudera.com wrote:
Hi,
Your understanding of the merge sort process seems correct, but I'm not
quite sure what your question is.
The merge process here is on the output side of the map task, so input
splits don't factor
Hi Akhil,
That's the default configuration, but it's not meant for actual use in a
cluster. You should be manually setting dfs.data.dir, dfs.name.dir, and
mapred.local.dir on your cluster to point to the disks you want Hadoop to
use. The use of /tmp as a default is because it's a convenient
Hi Suenghwa,
It's important to note that changing the dfs.replication config variable
does not change the current files in HDFS. You have to use fs -setrep on
those files to change their replication count. The replication count is set
when the files were created and not modified thereafter unless
On Fri, Jul 17, 2009 at 4:16 PM, Seunghwa Kang s.k...@gatech.edu wrote:
I checked with
bin/hadoop fs -stat %n %r input/*
part-0 4
part-1 4
part-2 4
part-3 4
part-4 4
part-5 4
part-6 4
part-7 4
and see replication factor is 4.
Also, I set replication
Hi Ryan,
To fix this you can simply chmod 755 that configure script referenced in the
error.
There is a JIRA for this that I think got committed that adds another
chmod task to build.xml, but it may not be in 0.20.0.
Thanks
-Todd
On Tue, Jul 14, 2009 at 11:36 AM, Ryan Smith
/home/rsmith/hadoop-0.20.0/build.xml:1405: exec returned: 2
Total time: 5 seconds
--
On Tue, Jul 14, 2009 at 3:32 PM, Todd Lipcon t...@cloudera.com wrote:
Hi Ryan,
I've never seen that issue. It sounds to me like your C
PM, Todd Lipcon t...@cloudera.com wrote:
Hi Ryan,
Sounds like HADOOP-5611:
https://issues.apache.org/jira/browse/HADOOP-5611
-Todd
On Tue, Jul 14, 2009 at 12:49 PM, Ryan Smith
ryan.justin.sm...@gmail.com
wrote:
Hello,
My problem was I didnt have g++ installed. :) So
Hi Mu,
Small job overhead is something that has been worked on a bit in recent
versions, but here's the gist of it (as best as I know, though I don't work
much in this area of the code):
- The JobTracker doesn't assign tasks forcefully to TaskTrackers. Instead,
the TaskTrackers send heartbeats
Hi Stuart,
Hadoop itself doesn't have any nice way of dealing with this that I know of.
I think your best bet is to do something like:
String dataModel = System.getProperty(sun.arch.data.model);
if (32.equals(dataModel)) {
System.loadLibrary(mylib_32bit);
} elseif (64.equals(dataModel)) {
Hi Pankil,
Basically there are two steps here - the first is to sort the two files.
This can be done using an mapreduce where the mapper extracts the join
column as a key.
If you make sure you have the same number of reducers (and partition by the
equijoin column) for both sorts, then you'll end
,
I got the concept but I have no idea about side input in mapper class.
Can you guide me more on that?
Pankil
On Thu, Jul 9, 2009 at 1:39 PM, Todd Lipcon t...@cloudera.com wrote:
Hi Pankil,
Basically there are two steps here - the first is to sort the two files.
This can be done
Hi David,
I'm unaware of any issue that would cause memory leaks when a file is open
for read for a long time.
There are some issues currently with write pipeline recovery when a file is
open for writing for a long time and the datanodes to which it's writing
fail. So, I would not recommend
201 - 230 of 230 matches
Mail list logo