RE: Custom InputFormat/OutputFormat

2008-07-10 Thread Jingkei Ly
I think you need to strip out the newline characters in the value you return, as the TextOutputFormat will treat each newline character as the start of a new record. -Original Message- From: Francesco Tamberi [mailto:[EMAIL PROTECTED] Sent: 09 July 2008 11:27 To:

Re: Custom InputFormat/OutputFormat

2008-07-10 Thread Francesco Tamberi
Thank you so much. The problem is that I need to operate on text as is, without modification, and I don't want the filepos to be outputted. There's no way in hadoop to map and to output a block of text containing newline characters? Thank you again, Francesco Jingkei Ly ha scritto: I think

RE: Custom InputFormat/OutputFormat

2008-07-10 Thread Jingkei Ly
I think I need to understand what you are trying to achieve better, so apologies if these two options don't answer your question fully! 1) If you want to operate on the text in the reducer, then you won't need to make any changes as the data between mapper and reducer is stored as SequenceFiles

Re: parallel mapping on single server

2008-07-10 Thread hong
Hi Follows Cao Haijun's reply: Suppose we have set 8 map tasks. How does each map know which part of input file it should process? 在 2008-7-10,上午2:33,Haijun Cao 写道: Set number of map slots per tasktracker to 8 in order to run 8 map tasks on one machine (assuming one tasktracker per

can you refer me to a User with Hadoop in production

2008-07-10 Thread Bill Boas
Please? Bill Boas VP, Business Development System Fabric Works 510-375-8840 [EMAIL PROTECTED] www.systemfabricworks.com

newbie in streaming: How to execute a single executable

2008-07-10 Thread Charan Thota
Hi, I'm a newbie in streaming in hadoop. I want to know how to execute a single c++ executable? Should it be a mapper only job? the executable is to cluster a set of points present in a file. so, it cannot be really said to be a mapper or reducer.Also, there is no code present,except for the

Re: Namenode Exceptions with S3

2008-07-10 Thread Tom White
I get (where the all-caps portions are the actual values...): 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode: java.lang.NumberFormatException: For input string: [EMAIL PROTECTED] at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)

Re: Compiling Word Count in C++ : Hadoop Pipes

2008-07-10 Thread Sandy
So, I had run into a similar issue. What version of Hadoop are you using? Make sure you are using the latest version of hadoop. That actually fixed it for me. There was something wrong with the build.xml file in earlier versions that prevented me from being able to get it to work properly. Once I

Re: Compiling Word Count in C++ : Hadoop Pipes

2008-07-10 Thread Sandy
One last thing: If that doesn't work, try following the instructions on the ubuntu setting up hadoop tutorial. Even if you aren't running ubuntu, I think it may be possible to use those instructions to set up things properly. That's what I eventually did. Link is here:

Re: Namenode Exceptions with S3

2008-07-10 Thread Lincoln Ritter
Thank you, Tom. Forgive me for being dense, but I don't understand your reply: If you make the default filesystem S3 then you can't run HDFS daemons. If you want to run HDFS and use an S3 filesystem, you need to make the default filesystem a hdfs URI, and use s3 URIs to reference S3

Version Mismatch when accessing hdfs through a nonhadoop java application?

2008-07-10 Thread Thibaut_
Hi, I'm trying to access the hdfs of my hadoop cluster in a non hadoop application. Hadoop 0.17.1 is running on standart ports This is the code I use: FileSystem fileSystem = null; String hdfsurl = hdfs://localhost:50010; fileSystem = new DistributedFileSystem();

Is Hadoop Really the right framework for me?

2008-07-10 Thread Sandy
Hello, I have been posting on the forums for a couple of weeks now, and I really appreciate all the help that I've been receiving. I am fairly new to Java, and even newer to the Hadoop framework. While I am sufficiently impressed with the Hadoop, quite a bit of the underlying functionality is

Re: Hadoop Architecture Question: Distributed Information Retrieval

2008-07-10 Thread Steve Loughran
Kylie McCormick wrote: Hello! My name is Kylie McCormick, and I'm currently working on creating a distributed information retrieval package with Hadoop based on my previous work with other middlewares like OGSA-DAI. I've been developing a design that works with the structures of the other

Re: Is Hadoop Really the right framework for me?

2008-07-10 Thread lohit
Hello Sandy, If you are using hadoop 0.18, you can use NLineInputFormat input format to get you job done. What this says is give exactly one line for each mapper. In your mapper you might have to encode your keys something like word:linenumber So output from your mapper would be key/value pair

Re: Is Hadoop Really the right framework for me?

2008-07-10 Thread lohit
Its not released yet. There are 2 options 1. download the un-released 0.18 branch from here http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 svn co http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 branch-0.18 2. get the NLineInputFormat.java from

Re: Cannot get passwordless ssh to work right

2008-07-10 Thread Shengkai Zhu
You should chmod ssh directory and authorized_keys of the * datanode/tasktracker* instead of jobtracker. On 7/11/08, Jim Lowell [EMAIL PROTECTED] wrote: I'm trying to get a 2-node Hadoop cluster up and running on Ubuntu. I've already gotten both nodes to run Hadoop as single-node following the

Re: Version Mismatch when accessing hdfs through a nonhadoop java application?

2008-07-10 Thread Shengkai Zhu
I've check cod ed in DataNode.java, exactly where you get the error; *...* *DataInputStream in=null;* *in = new DataInputStream( new BufferedInputStream(s.getInputStream(), BUFFER_SIZE)); short version = in.readShort(); if ( version != DATA_TRANFER_VERSION ) { throw new

Re: MapReduce with multi-languages

2008-07-10 Thread NOMURA Yoshihide
Mr. Taeho Kang, I need to analyze different character encoding text too. And I suggested to support encoding configuration in TextInputFormat. https://issues.apache.org/jira/browse/HADOOP-3481 But I think you should convert the text file encoding to UTF-8 at present. Regards, Taeho Kang: