This is also how I fixed this problem.
On 6/21/08, Sayali Kulkarni <[EMAIL PROTECTED]> wrote:
>
> Hi!
>
> My problem of "Too many fetch failures" as well as "shuffle error" was
> resolved when I added the list of all the slave machines in the /etc/hosts
> file.
>
> Earlier on every slave I just ha
Mr. Taeho Kang,
I need to analyze different character encoding text too.
And I suggested to support encoding configuration in TextInputFormat.
https://issues.apache.org/jira/browse/HADOOP-3481
But I think you should convert the text file encoding to UTF-8 at present.
Regards,
Taeho Kang:
Dea
I've check cod ed in DataNode.java, exactly where you get the error;
*...*
*DataInputStream in=null;*
*in = new DataInputStream(
new BufferedInputStream(s.getInputStream(), BUFFER_SIZE));
short version = in.readShort();
if ( version != DATA_TRANFER_VERSION ) {
throw new IOExceptio
You should chmod ssh directory and authorized_keys of the *
datanode/tasktracker* instead of jobtracker.
On 7/11/08, Jim Lowell <[EMAIL PROTECTED]> wrote:
>
> I'm trying to get a 2-node Hadoop cluster up and running on Ubuntu. I've
> already gotten both nodes to run Hadoop as single-node following
Is this data-local dispatching still a design or already implemeted?
And if implemented, in which version it is, for i didn't find its
implementation in 0.16.0.
Thanks
On 7/11/08, Joman Chu <[EMAIL PROTECTED]> wrote:
>
> Hadoop will try to split the file according to how it is split up in
> the
Okay, I've found some similar discussions in the archive, but I'm still not
clear on this. I'm new to Hadoop, so 'scuse my ignorance...
I'm writing a Hadoop tool to read in an event log, and I want to produce two
separate outputs as a result -- one for statistics, and one for budgeting.
Because
I'm using hadoop-0.17.0. Should I be using a more latest version?
Please tell me which version did you use?
On Fri, Jul 11, 2008 at 2:35 AM, Sandy <[EMAIL PROTECTED]> wrote:
> One last thing:
>
> If that doesn't work, try following the instructions on the ubuntu setting
> up hadoop tutorial. Even
Its not released yet. There are 2 options
1. download the un-released 0.18 branch from here
http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18
svn co http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18
branch-0.18
2. get the NLineInputFormat.java from
http://svn.apach
At Thu, 10 Jul 2008 15:50:31 -0500,
"Jim Lowell" <[EMAIL PROTECTED]> wrote:
>
> I'm trying to get a 2-node Hadoop cluster up and running on Ubuntu.
> I've already gotten both nodes to run Hadoop as single-node
> following the excellent instructions at
> http://www.michael-noll.com/wiki/Running_Had
Thank for the responses..
Lohit and Mahadev: this sounds fantastic; however, where may I got hadoop
0.18? I went to http://hadoop.apache.org/core/releases.html
But did not see a link for hadoop 0.18. After I did a brief search on
google, it did not seem that Hadoop has been officially released ye
Hello Sandy,
If you are using hadoop 0.18, you can use NLineInputFormat input format to get
you job done. What this says is give exactly one line for each mapper.
In your mapper you might have to encode your keys something like
So output from your mapper would be key/value pair as ,1
Reducer w
I think
src/mapred/org/apache/hadoop/mapred/lib/NLineInputFormat.java
is what you want.
Mahadev
> -Original Message-
> From: Michael Bieniosek [mailto:[EMAIL PROTECTED]
> Sent: Thursday, July 10, 2008 3:09 PM
> To: core-user@hadoop.apache.org; Sandy
> Subject: Re: Is Hadoop Really the ri
Thanks for the replies! If I use a single reducer, however, would it be
possible for there to be only one object (FinalSet) to which the Reduce
function merges? If not, I could redo the structure of the program, but I
was hoping to maintain it as much as possible.
Yes, I am aware of Nutch, and I'v
My understanding is that Hadoop doesn't know where the line breaks are when it
divides up your file, so each mapper will get some equally-sized chunk of file
containing some number of lines. It then does some patching so that you get
only whole lines for each mapper, but this does means that 1)
Kylie McCormick wrote:
Hello!
My name is Kylie McCormick, and I'm currently working on creating a
distributed information retrieval package with Hadoop based on my previous
work with other middlewares like OGSA-DAI. I've been developing a design
that works with the structures of the other systems
Hello,
I have been posting on the forums for a couple of weeks now, and I really
appreciate all the help that I've been receiving. I am fairly new to Java,
and even newer to the Hadoop framework. While I am sufficiently impressed
with the Hadoop, quite a bit of the underlying functionality is mask
Hadoop will try to split the file according to how it is split up in
the HDFS. For example, if an input file has three blocks with a
replication factor of two, there are six total blocks. Say there are
six machines, each with a single block. Block 1 is on machines 1 and
2, block 2 is on 3 and 4, an
Hi, I'm trying to access the hdfs of my hadoop cluster in a non hadoop
application. Hadoop 0.17.1 is running on standart ports
This is the code I use:
FileSystem fileSystem = null;
String hdfsurl = "hdfs://localhost:50010";
fileSystem = new DistributedFileSystem();
Hi, I use Toolrunner.run() for multiple MapReduce jobs. It seems to work well.
I've run sequences involving hundreds of MapReduce jobs in a for loop and it
hasn't died on me yet.
On Wed, July 9, 2008 4:28 pm, Mori Bellamy said:
> Hey all, I'm trying to chain multiple mapreduce jobs together to
>
Thank you, Tom.
Forgive me for being dense, but I don't understand your reply:
> If you make the default filesystem S3 then you can't run HDFS daemons.
> If you want to run HDFS and use an S3 filesystem, you need to make the
> default filesystem a hdfs URI, and use s3 URIs to reference S3
> files
One last thing:
If that doesn't work, try following the instructions on the ubuntu setting
up hadoop tutorial. Even if you aren't running ubuntu, I think it may be
possible to use those instructions to set up things properly. That's what I
eventually did.
Link is here:
http://wiki.apache.org/hado
So, I had run into a similar issue. What version of Hadoop are you using?
Make sure you are using the latest version of hadoop. That actually fixed it
for me. There was something wrong with the build.xml file in earlier
versions that prevented me from being able to get it to work properly. Once
I
> I get (where the all-caps portions are the actual values...):
>
> 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode:
> java.lang.NumberFormatException: For input string:
> "[EMAIL PROTECTED]"
>at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>
If you tell Hadoop to use a single reducer, it should produce a single file
of output.
btw, you do know about Nutch I presume?
http://lucene.apache.org/nutch/
This is a distributed IR system built using Hadoop.
Miles
2008/7/10 Kylie McCormick <[EMAIL PROTECTED]>:
> Hello!
> My name is Kylie Mc
I'm trying to get a 2-node Hadoop cluster up and running on Ubuntu. I've
already gotten both nodes to run Hadoop as single-node following the excellent
instructions at
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster).
Now I'm trying to convert them to a 2-no
Hi,
I'm a newbie in streaming in hadoop. I want to know how to execute a single
c++ executable?
Should it be a mapper only job? the executable is to cluster a set of points
present in
a file.
so, it cannot be really said to be a mapper or reducer.Also, there is no code
present,except for the e
Hi,
I faced the similar problem as Sandy. But this time I even had the jdk set
properly.
when i executed:
ant -Dcompile.c++=yes examples
the following was displayed:
Buildfile: build.xml
clover.setup:
clover.info:
[echo]
[echo] Clover not found. Code coverage reports disabled
All this is because you were using streaming.
Streaming treats each line in the stream as one "record" and then break it
into a key/value pair (using '\t' as the separator by default).
If you write your mapper class in Java, the values passed to the calls to
your map function should be the whole te
There's a number of companies using hadoop in production, listed here:
http://wiki.apache.org/hadoop/PoweredBy
Bill Boas wrote:
Please?
Bill Boas
VP, Business Development
System Fabric Works
510-375-8840
[EMAIL PROTECTED]
www.systemfabricworks.com
The user group meeting is usually a good place to network with people
using Hadoop in production. The next one is on July 22nd.
-Original Message-
From: Bill Boas [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 10, 2008 9:52 AM
To: core-user@hadoop.apache.org
Subject: can you refer me to
Really thanks,
but I still cannot understand why lines after the first one become a
key.. why it happens? Shouldn't they be still Value's part??
I implemented a CustomOutputFormat that writes Values only out and I got:
first_line_in_text_block
EOF
I tried outputting Key only and I got:
secon
Please?
Bill Boas
VP, Business Development
System Fabric Works
510-375-8840
[EMAIL PROTECTED]
www.systemfabricworks.com
I think I see now. Just to recap... you are right that TextOutputFormat
outputs Key\tValue\n, which in your case gives:
File_position\tText_block\n.
But as your Text_block contains '\n' your output actually comes out as:
Key Value
---
Ok, I would not like to annoy you but I think I'm missing something..
I have to:
- extract relevant text blocks from really big document (
TEXTBLOCK )
- apply some python/c/c++ functions as mappers to text blocks (called
via shell script)
- output processed text back to text file
In order to d
Hi
Follows Cao Haijun's reply:
Suppose we have set 8 map tasks. How does each map know which part of
input file it should process?
在 2008-7-10,上午2:33,Haijun Cao 写道:
Set number of map slots per tasktracker to 8 in order to run 8 map
tasks
on one machine (assuming one tasktracker per machin
I think I need to understand what you are trying to achieve better, so
apologies if these two options don't answer your question fully!
1) If you want to operate on the text in the reducer, then you won't
need to make any changes as the data between mapper and reducer is
stored as SequenceFiles so
Thank you so much.
The problem is that I need to operate on text as is, without
modification, and I don't want the filepos to be outputted.
There's no way in hadoop to map and to output a block of text containing
newline characters?
Thank you again,
Francesco
Jingkei Ly ha scritto:
I think yo
Have you considered http://www.cascading.org?
On Thu, Jul 10, 2008 at 10:44 AM, Amar Kamat <[EMAIL PROTECTED]> wrote:
> Deyaa Adranale wrote:
>
>> I have checked the code JobControl, it submits a set of jobs asyncronously
>> and provide methods for checking their status, suspending them, and so o
Does Hadoop suport chaiing multiple jobs with hadoop streaming mechanism? If
so, how can I do that? Thanks.
--
Best Wishes
Meng Xinfan(蒙新泛)
Institute of Computational Linguistics
Department of Computer Science & Technology
School of Electronic Engineering & Computer Science
Peking University
Beij
Stuart Sierra wrote:
I have Hadoop 0.17.1 and an AWS Secret Key that contains a slash ('/').
With distcp, I found that using the URL format s3://ID:[EMAIL PROTECTED]/
did not work, even if I encoded the slash as "%2F". I got
"org.jets3t.service.S3ServiceException: S3 HEAD request failed.
Respon
I think you need to strip out the newline characters in the value you
return, as the TextOutputFormat will treat each newline character as the
start of a new record.
-Original Message-
From: Francesco Tamberi [mailto:[EMAIL PROTECTED]
Sent: 09 July 2008 11:27
To: core-user@hadoop.apache.o
Hello Hadoopers:
I am trying to running the same map reduce job on HDFS and local file
system. That is one time, I run the map reduce job on HDFS and another time
I run the same map reduce job with the same input data on local file ext3
system without using HDFS. I found that the number of maps g
Hi all,
No one can give me some hint?
Please apoligize me but I cannot understand if there's something wrong
with my ask.
Thak you,
Francesco
Deyaa Adranale wrote:
I have checked the code JobControl, it submits a set of jobs
asyncronously and provide methods for checking their status,
suspending them, and so on.
It also supports job dependencies. A particular job can depend on other
jobs and hence it supports chaining. *JobControl* a
I have checked the code JobControl, it submits a set of jobs
asyncronously and provide methods for checking their status, suspending
them, and so on.
i think what Mori means by chaining jobs is to execute them after each
other, so this class might not help him
i have run chained jobs like Mor
Thanks Ashish, I am happy to build and try and run from svn/cvs and just try
loading in data, querying etc whenever you have something.
Cheers
Tim
On Wed, Jul 9, 2008 at 8:46 PM, Ashish Thusoo <[EMAIL PROTECTED]> wrote:
> Hi Tim,
>
> Point well taken. We are trying to get this out as soon as po
46 matches
Mail list logo