Nope, But if the intent is so then there are 2 ways of doing it.
1. Just extend the input format of your choice and override
isSplitable() method to return false.
2. Compress your text file using a compression format supported by
hadoop (e.g gzip). This will ensure that one map task processes 1
Mori Bellamy wrote:
i discovered that some of my code was causing out of bounds
exceptions. i cleaned up that code and the map tasks seemed to work.
that confuses me -- i'm pretty sure hadoop is resilient to a few map
tasks failing (5 out of 13k). before this fix, my remaining 2% of
tasks
jerrro wrote:
Hello,
I was wondering - could someone tell me what are the reasons that I could
get failure with certain map tasks on a node?
Well, that depends on the kind of errors you are seeing. Could you plz
post the logs/error messages?
Amar
Any idea that comes to mind
would work (it
This is strange. If you don't mind, pls send the script to me.
-Original Message-
From: Yunhong Gu1 [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 03, 2008 9:49 AM
To: core-user@hadoop.apache.org
Subject: topology.script.file.name
Hello,
I have been trying to figure out
To my surprise, only one output value of mapper is not reaching combiner. and
It is consistent when I repeated the experimentation. Same point directly
reaches reducer without going thru the combiner. I am surprised how can this
happen?
novice user wrote:
Regarding the conclusion,
I am
Hello all,
After recent talk about joins, I have a (possibly) stupid question:
What is the difference between the join operations in
o.a.h.mapred.join and the standard merge step in a MapReduce job?
I understand that doing a join in the Mapper would be much more
efficient if you're lucky enough
I am actually more interested in _theoretically_ what could happen to a map
tasks to fail or to take longer...
Don't have a specific case. Thanks, Jerr
Amar Kamat wrote:
jerrro wrote:
Hello,
I was wondering - could someone tell me what are the reasons that I could
get failure with
I have installed cygwin and hadoop-0.17.0 and have done 3 steps:
1)add JAVA_HOME in hadoop-env.sh
2)create hadoop-site.xml
3)execute commands:
cd /cygdrive/c/hadoop-0.17.0
bin/start-all.sh
bin/hadoop dfs -rmr input
bin/hadoop dfs -put conf input NOT WORKING
bin/hadoop dfs -ls
bin/stop-all.sh
Hi Stuart,
Join is a higher level logical operation while map/reduce is a technique that
could be used implement it. Specifically, in relational algebra, the join
construct specifies how to form a single output row from 2 rows arising from
two input streams. There are very many ways of
Yes this is a known bug.
http://issues.apache.org/jira/browse/HADOOP-1212
You should manually remove current directory from every data-node
after reformatting the name-node and start the cluster again.
I do not believe there is any other way.
Thanks,
--Konstantin
Taeho Kang wrote:
No, I don't
On Tuesday 01 July 2008 09:36:18 Ashok Varma wrote:
Hi ,
I'm trying to install Fedora8 as a Guest OS in XEN on CentOS5.2 -64 bit.
Always getting failed to Mount directory error. I configured NFS share,
then also
installation getting failed in middle..
Slightly offtopic on a hadoop mailing
Forgive me if you already know this, but the correctness of the map-
side join is very sensitive to partitioning; if your input in sorted
but equal keys go to different partitions, your results may be
incorrect. Is your input such that the default partitioning is
sufficient? Have you
Ashish ably outlined the differences between a join and a merge, but
might be confusing the o.a.h.mapred.join package and the contrib/
data_join framework. The former is used for map-side joins and has
nothing to do with either the shuffle or the reduce; the latter
effects joins in the
We are using the default partitioner. I am just about to start verifying
my result as it took quite a while to work my way through the in-obvious
issues of hand writing MapFiles, thinks like the key and value class are
extracted from the jobconf, output key/value.
Question: I looked at the
Nathan Marz wrote:
Is there a way to get stats of the currently running job
programatically?
This should probably be an FAQ. In your Mapper or Reducer's configure
implementation, you can get a handle on the running job with:
RunningJob running =
new
This is my script, which is actually a C++ program:
#include iostream
#include string
using namespace std;
int main(int argc, char** argv)
{
for (int i = 1; i argc; i ++ )
{
string dn = argv[i];
if (dn.substr(0, 5) == rack1)
cout /rack1;
else if
Hi guys:
I am running hadoop on a 8 nodes cluster. I uses start-all.sh to boot
hadoop and it shows that all 8 data nodes are started. However, when I use
bin/hadoop dfsadmin -report to check the status of the data nodes and it
shows only one data node (the one with the same host as name node) is
Hi,
Using Hadoop 0.16.2, I am seeing seeing the following in the NN log:
2008-07-03 19:46:26,715 ERROR dfs.NameNode - java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
at
Hi,
I submitted a patch using JIRA and the Hudson system told
that -1 contrib tests. The patch failed contrib unit tests.
Seeing the console output, I noticed that it says build successful
for contrib tests. So I am confused that what failed contrib
test are referred to in Hudson output?
This
Hello,
I am new to Hadoop and am trying to run
HadoopDfsReadWriteExamplehttp://wiki.apache.org/hadoop/HadoopDfsReadWriteExample?action=fullsearchvalue=linkto%3A%22HadoopDfsReadWriteExample%22context=180
from eclipse on Windows XP.
I have added following files in the build path for the project:
A bug was introduced by HADOOP-3480. HADOOP-3653 will fix it.
Nige
On Jul 3, 2008, at 5:24 PM, Abdul Qadeer wrote:
Hi,
I submitted a patch using JIRA and the Hudson system told
that -1 contrib tests. The patch failed contrib unit tests.
Seeing the console output, I noticed that it says
Thanks for all interest.
BTW, I can't handle too many people via private email , Please join this group.
http://groups.google.com/group/hrdfstore
Thanks, Edward
On Wed, Jul 2, 2008 at 3:06 PM, Edward J. Yoon [EMAIL PROTECTED] wrote:
Hello all,
The HRdfStore team looking for a couple more
Hi, zhang:
Once you start hadoop with shell start-all.sh, a hadoop status pape can
be accessed through http://namenode-ip:port/dfshealth. Port is specified by
namedfs.http.address/name
in your hadoop-default.xml.
If the datanodes status is not as expected, you need to check log files.
I'm a newbie, so feel free to rftm is this is old hat: what's the best way to
do a nested for loop in hadoop? Specifically, lets say I've got a list of
elements, and I want to do an all against all comparison. The standard nested
for loop would be:
for i in 1..10:
for j in i..10:
24 matches
Mail list logo