I think you need to strip out the newline characters in the value you
return, as the TextOutputFormat will treat each newline character as the
start of a new record.
-Original Message-
From: Francesco Tamberi [mailto:[EMAIL PROTECTED]
Sent: 09 July 2008 11:27
To:
Thank you so much.
The problem is that I need to operate on text as is, without
modification, and I don't want the filepos to be outputted.
There's no way in hadoop to map and to output a block of text containing
newline characters?
Thank you again,
Francesco
Jingkei Ly ha scritto:
I think
I think I need to understand what you are trying to achieve better, so
apologies if these two options don't answer your question fully!
1) If you want to operate on the text in the reducer, then you won't
need to make any changes as the data between mapper and reducer is
stored as SequenceFiles
Hi
Follows Cao Haijun's reply:
Suppose we have set 8 map tasks. How does each map know which part of
input file it should process?
在 2008-7-10,上午2:33,Haijun Cao 写道:
Set number of map slots per tasktracker to 8 in order to run 8 map
tasks
on one machine (assuming one tasktracker per
Please?
Bill Boas
VP, Business Development
System Fabric Works
510-375-8840
[EMAIL PROTECTED]
www.systemfabricworks.com
Hi,
I'm a newbie in streaming in hadoop. I want to know how to execute a single
c++ executable?
Should it be a mapper only job? the executable is to cluster a set of points
present in
a file.
so, it cannot be really said to be a mapper or reducer.Also, there is no code
present,except for the
I get (where the all-caps portions are the actual values...):
2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode:
java.lang.NumberFormatException: For input string:
[EMAIL PROTECTED]
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
So, I had run into a similar issue. What version of Hadoop are you using?
Make sure you are using the latest version of hadoop. That actually fixed it
for me. There was something wrong with the build.xml file in earlier
versions that prevented me from being able to get it to work properly. Once
I
One last thing:
If that doesn't work, try following the instructions on the ubuntu setting
up hadoop tutorial. Even if you aren't running ubuntu, I think it may be
possible to use those instructions to set up things properly. That's what I
eventually did.
Link is here:
Thank you, Tom.
Forgive me for being dense, but I don't understand your reply:
If you make the default filesystem S3 then you can't run HDFS daemons.
If you want to run HDFS and use an S3 filesystem, you need to make the
default filesystem a hdfs URI, and use s3 URIs to reference S3
Hi, I'm trying to access the hdfs of my hadoop cluster in a non hadoop
application. Hadoop 0.17.1 is running on standart ports
This is the code I use:
FileSystem fileSystem = null;
String hdfsurl = hdfs://localhost:50010;
fileSystem = new DistributedFileSystem();
Hello,
I have been posting on the forums for a couple of weeks now, and I really
appreciate all the help that I've been receiving. I am fairly new to Java,
and even newer to the Hadoop framework. While I am sufficiently impressed
with the Hadoop, quite a bit of the underlying functionality is
Kylie McCormick wrote:
Hello!
My name is Kylie McCormick, and I'm currently working on creating a
distributed information retrieval package with Hadoop based on my previous
work with other middlewares like OGSA-DAI. I've been developing a design
that works with the structures of the other
Hello Sandy,
If you are using hadoop 0.18, you can use NLineInputFormat input format to get
you job done. What this says is give exactly one line for each mapper.
In your mapper you might have to encode your keys something like
word:linenumber
So output from your mapper would be key/value pair
Its not released yet. There are 2 options
1. download the un-released 0.18 branch from here
http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18
svn co http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18
branch-0.18
2. get the NLineInputFormat.java from
You should chmod ssh directory and authorized_keys of the *
datanode/tasktracker* instead of jobtracker.
On 7/11/08, Jim Lowell [EMAIL PROTECTED] wrote:
I'm trying to get a 2-node Hadoop cluster up and running on Ubuntu. I've
already gotten both nodes to run Hadoop as single-node following the
I've check cod ed in DataNode.java, exactly where you get the error;
*...*
*DataInputStream in=null;*
*in = new DataInputStream(
new BufferedInputStream(s.getInputStream(), BUFFER_SIZE));
short version = in.readShort();
if ( version != DATA_TRANFER_VERSION ) {
throw new
Mr. Taeho Kang,
I need to analyze different character encoding text too.
And I suggested to support encoding configuration in TextInputFormat.
https://issues.apache.org/jira/browse/HADOOP-3481
But I think you should convert the text file encoding to UTF-8 at present.
Regards,
Taeho Kang:
18 matches
Mail list logo