If you add these nodes, data will be put on them as you add data to the
cluster.
Soon after adding the nodes you should rebalance the storage to avoid age
related surprises in how files are arranged in your cluster.
Other than that, your addition should cause little in the way of surprises.
On T
Dear all
I'm sorry to disturb you.
Our cluster has 200 nodes now. In order to improve its ability, we hope
to add 60 nodes into the current cluster. However, we all don't know what
will happen if we add so many nodes at the same time. Could you give me some
tips and notes? During the proces
Hi
I am using pig 2.0 and nutch 1.0; but it dont have common hadoop verion.
what is common hadoop verion for both pig and hadoop;
GIVE the pig version, nutch version and hadoo
please can any one help on this
thanks
ramanaiah
Hi,
Please check out the below links,
http://mail-archives.apache.org/mod_mbox/hadoop-pig-user/200904.mbox/%3c004f01c9c6f2$b1222500$13666f...@com%3e
http://wiki.apache.org/nutch/Upgrading_Hadoop
If you find any issues in upgrading Hadoop version with Nutch probably
getting in touch with Nutch m
There's an interview with one of the GFS engineers on ACM Queue that
might be of interest to you. Its related to GFS, but I think the
underlying issues are the same in HDFS. There is lot of discussion on
dealing with large number of files. Here's the link:
http://queue.acm.org/detail.cfm?id=1594206
Hello Jakob
Yes I have gone through the Job Submission strategy in Hadoop, that is
helpful. But I was looking at interdependent jobs, I was trying to switch
the state of a running job to waiting. I was looking at Jobcontrol for that
reason.
I have gone through the document you pointed out, was wo
Hi Raghu,
The file doesn't appear in the cluster when I saw it from Namenode UI. Also, I
have a monitor at cluster side which checks whether file is created and throws
an exception when it is not created. And, it threw an exception saying "File
not found".
Thanks
Pallavi
- Original Message
Hi Everybody,
I am already tracing through source code and trying to
figure out things. Any way thanks for all your suggestions.
Regards,
Ashish
On Tue, Aug 11, 2009 at 11:32 PM, Jakob Homan wrote:
> Hey Ashish-
> In terms of how overall design architecture of HDFS, I woul
If you know you'll only have one object in the file, you could write your
own Writable implementation which doesn't write its length. The problem is
that you'll never be able to *read* it, since writables only get an input
stream and thus don't know the file size.
If you choose to do this, just mo
Ah that explains it, thanks Todd. Is there a way to serialize an object
without using BytesWritable, or some way I can have a "perfect" serialized
file so I won't have to keep discarding the first 4 bytes of the files?
-- Kris.
On Tue, Aug 11, 2009 at 7:03 PM, Todd Lipcon wrote:
> BytesWritabl
BytesWritable serializes itself by first outputting the array length, and
then outputting the array itself. The 4 bytes at the top of the file are the
length of the value itself.
Hope that helps
-Todd
On Tue, Aug 11, 2009 at 6:33 PM, Kris Jirapinyo wrote:
> Hi all,
> I was wondering if anyone'
Hi all,
I was wondering if anyone's encountered 4 extra bytes at the beginning of
the serialized object file using MultipleOutputFormat. Basically, I am
using BytesWritable to write the serialized byte arrays in the reducer
phase. My writer is a generic one:
public class GenericOutputFormat e
Hey Mithila-
I would point you to the WordCount example
(http://hadoop.apache.org/common/docs/current/mapred_tutorial.html) for
a basic example of how jobs are created by supplying a JobConf to the
JobClient. This will submit your conf to the cluster which will create
and run the job.
Th
Wasim,
RecordReader implementations should never require that elements not be
spread across multiple blocks. The start and end offsets into a file in an
InputSplit are taken as soft limits, not hard ones. The RecordReader
implementations that come with Hadoop perform this way, and any that you
aut
Hello All
How do I create a Job in Hadoop using Class Job? And how do I run it?
Generally JobClient.runJob(conf) is used, but the parameter in not of the
type Job.
Also How do I use the class JobControl? Can I create Threads in a Hadoop
(similar to multithreading in JAVA), where different Threads
Hi.
Can anybody point me to the Apache documentation page for
"mapred.input.format.skew" ?
I cannot find the documentation for this parameter.
What does it mean?
Thanks
Your assumption is correct. When you close the file, others can read the
data. There is no delay expected before the data is visible. If there is
an error either write() or close() would throw an error.
When you say data is not visible do you mean readers can not see the
file or can not see
Hello Everyone
I was trying to figure out how the JobStatus class can be used in Hadoop.
Can someone guide me to an example? I want to put the method setRunState()
to use.
Thanks!
Mithila
I have a 6 node cluster running Hadoop 0.18.3. I'm trying to figure out
how the data was spread out like this:
node001 94.15%
node002 94.16%
node003 48.22%
node004 47.85%
node005 48.12%
node006 43.18%
Node 001 (NN) and node 002( secondary NN) bo
Hey Ashish-
In terms of how overall design architecture of HDFS, I would point
you to the project documentation:
http://hadoop.apache.org/common/docs/current/hdfs_design.html
For specific data structures, your first stop should be the INode class
and its extending classes, located in
src/ja
Does anyone have any experience with using FTP with HDFS? I have all the
config files setup correctly and have started the service.
But, when I connect from a remote (Windows) machine: "Connection closed by
remote host."
And on the local (Ubuntu) machine: "412 Service not available... Permission
d
Hey Stas,
You can also use a utility like Linux-HA (aka heartbeat) to handle IP
address failover. It will even send gratuitous ARPs to make sure to get the
new mac address registered after a failover. Check out this blog for info
about a setup like this:
http://www.cloudera.com/blog/2009/07/22/ha
Hi Jason,
Apologies for missing version information in my previous mail. I am
using hadoop-0.18.3. I am getting FSDataOutputStream object using
fs.create(new Path(some_file_name)), where fs is FileSystem object. And,
I am closing the file using close().
Thanks
Pallavi
-Original Message-
Note that there are multiple log files (one for each day). Make sure you
searched all the relevant days. You can also check datanode log for this
block.
HDFS writes to all three datanodes at the time you write the data. It is
possible that other two datanodes also encountered errors.
This
Sorry to bug you guys again but I found the problem.
An old hadoop-site that was in the class path and had limit on the
"mapred.child.ulimit" to 50
Thanks
-Yair
-Original Message-
From: Yair Even-Zohar [mailto:ya...@audiencescience.com]
Sent: Tuesday, August 11, 2009 4:11 PM
To: comm
I'm running a mapreduce using Hbase table as input with some distributed
cache file and all works well.
However, when I set:
c.set("mapred.child.java.opts", "-Xmx512m") in the java code and
using the exact same input and exact same distributed cache I'm getting
the following:
on the maste
Hi
I am using pig 2.0 and nutch 1.0; but it dont have common hadoop verion.
what is common verion;
Which version of pig and which version of nutch i need to use (this
brings comman hadoop version only)
please can any one help on this;
thanks
ramana
Please provide information on what version of hadoop you are using and the
method of opening and closing the file.
On Tue, Aug 11, 2009 at 12:48 AM, Pallavi Palleti <
pallavi.pall...@corp.aol.com> wrote:
> Hi all,
>
> We have an application where we pull logs from an external server(far apart
>
On Tue, Aug 11, 2009 at 4:45 AM, Mayuran Yogarajah <
mayuran.yogara...@casalemedia.com> wrote:
> Hello,
>
> If you are interested, you could try to trace one of these block ids in
>> NameNode log to see what happened it. We are always eager to hear about
>> irrecoverable errors. Please mention ha
Stas Oskin wrote:
Hi.
What is the recommended a utility for this?
Thanks.
for those of us whose hosts are virtual and who have control over the
infrastructure its fairly simple: bring up a new VM on a different blade
with the same base image and hostname.
If you have a non-virtual cluster
John Clarke wrote:
Thanks for the reply. I considered that but I have a lot of threads in my
application and it's v handy to have log4j output the thread name with the
log message.
It's like the log4j.properties file in the conf/ directory is not being used
as any changes I make seem to have no
Hi
I am using pig 2.0 and nutch 1.0; but it dont have common hadoop verion.
what is common verion;
Which version of pig and which version of nutch i need to use (this
brings comman hadoop version only)
please can any one help on this;
thanks
ramana
Hi all,
We have an application where we pull logs from an external server(far apart
from hadoop cluster) to hadoop cluster. Sometimes, we could see huge delay (of
1 hour or more) in actually seeing the data in HDFS though the file has been
closed and the variable is set to null from the externa
33 matches
Mail list logo