You can copy data from any node, so if you can do it from multiple nodes
your performance would be better (although be sure not to overlap
files). The master node is updated once a the block is copied it
replication number of times. So if default replication is 3 then the 3
replicates must be
I am not sure that you are following the right techniqs. We have the same
issue concerning loading master/slave, still trying to find some more
details how to do it better but could not advice you now..
keep posting probably sombody can give you the correct answer, good
questions actually
th
Am I missing something very fundamental ? Can someone comment on these
queries ?
Thanks,
Venkates P B
On 8/1/07, Venkates .P.B. <[EMAIL PROTECTED]> wrote:
>
>
> Few queries regarding the way data is loaded into HDFS.
>
> -Is it a common practice to load the data into HDFS only through the
> maste
"HADOOP-1374" refers to the JIRA (Hadoop's issue-tracking software of
choice) ticket that details a bug that may be related to yours. Check it
out here: http://issues.apache.org/jira/browse/HADOOP-1374.
Regards,
Jeff
On 8/2/07, Samuel LEMOINE <[EMAIL PROTECTED]> wrote:
>
> Thanks for your answer
Hi,
Page 3 of the "Hacking Pig" documentation suggests it is possible to
LOAD a file, do stuff with it and then STORE it out... Is this correct
or does it have to be done in Java?
Regards,
Shane
PS. Apologies if this is the wrong list.
Samuel LEMOINE wrote:
Thanks for your answer, you give me hope :)
TO answer your interrogations, I use the haddop script "start-all.sh"
to launch hadoop, and each plateform is a JAVA 1.5.0 jdk running on
ubuntu 7.0.4.
Thank you this was important.
Your few advices raise as many questions fo
Hi Michael,
Thanks for the reply. I've tried to write some code to do this now but
its not working. I was wondering if there's anything obviously wrong?
After my runJob() I put (just as a test):
JobClient aJC = new JobClient();
String jobid = jobConf.get("mapred.job.id");
aJC.setConf(jobConf
Hello,
There really was a problem with SSH, but it's ok now. When I issue the
command start-all.sh from the master node (after formating the namenode), it
properly connects to the slave. The problem is that the datanode, as well as
the tasktracker are not initialized.
SSH is working (at least, it
On 8/2/07 5:20 AM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:
> I've found the
> getMapTaskReports method in the JobClient class, but can't work out
> how to access it other than by creating a new instance of JobClient -
> but then that JobClient would be a differnt one to the one that was
I have some purely subjective experience. I invite anyone with empirical
evidence to pipe up if possible.
It can be used, but there are a couple of current important caveats:
1] If your maps have a tremendous amount of output, the TaskTrackers will
start producing OutOfMemory exceptions (and depe
I notice that the process reduce > copy is very slow.
I would like to configure hadoop to compress the map ouput.
mapred.compress.map.output
true
map.output.compression.type
RECORD
I'm wondering if someone already use it or if you have some statistics about
the improvement.
A
Very insightful your response.
The problem was with the SSH server. It wasn't working. Now everything seems
to be running ok.
Thanks Ollie,
Lucas
On 7/31/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> Hi Lucas,
>
> Sounds strange, it should work. As long as you have all the names of
> the
Hi Doug,
Thanks for the reply. Could you possibly explain how my program would
get access to the task reports from each tracker? I've found the
getMapTaskReports method in the JobClient class, but can't work out
how to access it other than by creating a new instance of JobClient -
but the
Thanks for your answer, you give me hope :)
TO answer your interrogations, I use the haddop script "start-all.sh" to
launch hadoop, and each plateform is a JAVA 1.5.0 jdk running on ubuntu
7.0.4.
Your few advices raise as many questions for me : you mention a
discussion named "HADOOP-1374", w
14 matches
Mail list logo