Re: Too many fetch failures AND Shuffle error

2008-07-10 Thread Shengkai Zhu
This is also how I fixed this problem. On 6/21/08, Sayali Kulkarni <[EMAIL PROTECTED]> wrote: > > Hi! > > My problem of "Too many fetch failures" as well as "shuffle error" was > resolved when I added the list of all the slave machines in the /etc/hosts > file. > > Earlier on every slave I just ha

Re: MapReduce with multi-languages

2008-07-10 Thread NOMURA Yoshihide
Mr. Taeho Kang, I need to analyze different character encoding text too. And I suggested to support encoding configuration in TextInputFormat. https://issues.apache.org/jira/browse/HADOOP-3481 But I think you should convert the text file encoding to UTF-8 at present. Regards, Taeho Kang: Dea

Re: Version Mismatch when accessing hdfs through a nonhadoop java application?

2008-07-10 Thread Shengkai Zhu
I've check cod ed in DataNode.java, exactly where you get the error; *...* *DataInputStream in=null;* *in = new DataInputStream( new BufferedInputStream(s.getInputStream(), BUFFER_SIZE)); short version = in.readShort(); if ( version != DATA_TRANFER_VERSION ) { throw new IOExceptio

Re: Cannot get passwordless ssh to work right

2008-07-10 Thread Shengkai Zhu
You should chmod ssh directory and authorized_keys of the * datanode/tasktracker* instead of jobtracker. On 7/11/08, Jim Lowell <[EMAIL PROTECTED]> wrote: > > I'm trying to get a 2-node Hadoop cluster up and running on Ubuntu. I've > already gotten both nodes to run Hadoop as single-node following

Re: parallel mapping on single server

2008-07-10 Thread Shengkai Zhu
Is this data-local dispatching still a design or already implemeted? And if implemented, in which version it is, for i didn't find its implementation in 0.16.0. Thanks On 7/11/08, Joman Chu <[EMAIL PROTECTED]> wrote: > > Hadoop will try to split the file according to how it is split up in > the

Outputting to different paths from the same input file

2008-07-10 Thread schnitzi
Okay, I've found some similar discussions in the archive, but I'm still not clear on this. I'm new to Hadoop, so 'scuse my ignorance... I'm writing a Hadoop tool to read in an event log, and I want to produce two separate outputs as a result -- one for statistics, and one for budgeting. Because

Re: Compiling Word Count in C++ : Hadoop Pipes

2008-07-10 Thread chaitanya krishna
I'm using hadoop-0.17.0. Should I be using a more latest version? Please tell me which version did you use? On Fri, Jul 11, 2008 at 2:35 AM, Sandy <[EMAIL PROTECTED]> wrote: > One last thing: > > If that doesn't work, try following the instructions on the ubuntu setting > up hadoop tutorial. Even

Re: Is Hadoop Really the right framework for me?

2008-07-10 Thread lohit
Its not released yet. There are 2 options 1. download the un-released 0.18 branch from here http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 svn co http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 branch-0.18 2. get the NLineInputFormat.java from http://svn.apach

Re: Cannot get passwordless ssh to work right

2008-07-10 Thread Erik Hetzner
At Thu, 10 Jul 2008 15:50:31 -0500, "Jim Lowell" <[EMAIL PROTECTED]> wrote: > > I'm trying to get a 2-node Hadoop cluster up and running on Ubuntu. > I've already gotten both nodes to run Hadoop as single-node > following the excellent instructions at > http://www.michael-noll.com/wiki/Running_Had

Re: Is Hadoop Really the right framework for me?

2008-07-10 Thread Sandy
Thank for the responses.. Lohit and Mahadev: this sounds fantastic; however, where may I got hadoop 0.18? I went to http://hadoop.apache.org/core/releases.html But did not see a link for hadoop 0.18. After I did a brief search on google, it did not seem that Hadoop has been officially released ye

Re: Is Hadoop Really the right framework for me?

2008-07-10 Thread lohit
Hello Sandy, If you are using hadoop 0.18, you can use NLineInputFormat input format to get you job done. What this says is give exactly one line for each mapper. In your mapper you might have to encode your keys something like So output from your mapper would be key/value pair as ,1 Reducer w

RE: Is Hadoop Really the right framework for me?

2008-07-10 Thread Mahadev Konar
I think src/mapred/org/apache/hadoop/mapred/lib/NLineInputFormat.java is what you want. Mahadev > -Original Message- > From: Michael Bieniosek [mailto:[EMAIL PROTECTED] > Sent: Thursday, July 10, 2008 3:09 PM > To: core-user@hadoop.apache.org; Sandy > Subject: Re: Is Hadoop Really the ri

Re: Hadoop Architecture Question: Distributed Information Retrieval

2008-07-10 Thread Kylie McCormick
Thanks for the replies! If I use a single reducer, however, would it be possible for there to be only one object (FinalSet) to which the Reduce function merges? If not, I could redo the structure of the program, but I was hoping to maintain it as much as possible. Yes, I am aware of Nutch, and I'v

Re: Is Hadoop Really the right framework for me?

2008-07-10 Thread Michael Bieniosek
My understanding is that Hadoop doesn't know where the line breaks are when it divides up your file, so each mapper will get some equally-sized chunk of file containing some number of lines. It then does some patching so that you get only whole lines for each mapper, but this does means that 1)

Re: Hadoop Architecture Question: Distributed Information Retrieval

2008-07-10 Thread Steve Loughran
Kylie McCormick wrote: Hello! My name is Kylie McCormick, and I'm currently working on creating a distributed information retrieval package with Hadoop based on my previous work with other middlewares like OGSA-DAI. I've been developing a design that works with the structures of the other systems

Is Hadoop Really the right framework for me?

2008-07-10 Thread Sandy
Hello, I have been posting on the forums for a couple of weeks now, and I really appreciate all the help that I've been receiving. I am fairly new to Java, and even newer to the Hadoop framework. While I am sufficiently impressed with the Hadoop, quite a bit of the underlying functionality is mask

Re: parallel mapping on single server

2008-07-10 Thread Joman Chu
Hadoop will try to split the file according to how it is split up in the HDFS. For example, if an input file has three blocks with a replication factor of two, there are six total blocks. Say there are six machines, each with a single block. Block 1 is on machines 1 and 2, block 2 is on 3 and 4, an

Version Mismatch when accessing hdfs through a nonhadoop java application?

2008-07-10 Thread Thibaut_
Hi, I'm trying to access the hdfs of my hadoop cluster in a non hadoop application. Hadoop 0.17.1 is running on standart ports This is the code I use: FileSystem fileSystem = null; String hdfsurl = "hdfs://localhost:50010"; fileSystem = new DistributedFileSystem();

Re: How to chain multiple hadoop jobs?

2008-07-10 Thread Joman Chu
Hi, I use Toolrunner.run() for multiple MapReduce jobs. It seems to work well. I've run sequences involving hundreds of MapReduce jobs in a for loop and it hasn't died on me yet. On Wed, July 9, 2008 4:28 pm, Mori Bellamy said: > Hey all, I'm trying to chain multiple mapreduce jobs together to >

Re: Namenode Exceptions with S3

2008-07-10 Thread Lincoln Ritter
Thank you, Tom. Forgive me for being dense, but I don't understand your reply: > If you make the default filesystem S3 then you can't run HDFS daemons. > If you want to run HDFS and use an S3 filesystem, you need to make the > default filesystem a hdfs URI, and use s3 URIs to reference S3 > files

Re: Compiling Word Count in C++ : Hadoop Pipes

2008-07-10 Thread Sandy
One last thing: If that doesn't work, try following the instructions on the ubuntu setting up hadoop tutorial. Even if you aren't running ubuntu, I think it may be possible to use those instructions to set up things properly. That's what I eventually did. Link is here: http://wiki.apache.org/hado

Re: Compiling Word Count in C++ : Hadoop Pipes

2008-07-10 Thread Sandy
So, I had run into a similar issue. What version of Hadoop are you using? Make sure you are using the latest version of hadoop. That actually fixed it for me. There was something wrong with the build.xml file in earlier versions that prevented me from being able to get it to work properly. Once I

Re: Namenode Exceptions with S3

2008-07-10 Thread Tom White
> I get (where the all-caps portions are the actual values...): > > 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode: > java.lang.NumberFormatException: For input string: > "[EMAIL PROTECTED]" >at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) >

Re: Hadoop Architecture Question: Distributed Information Retrieval

2008-07-10 Thread Miles Osborne
If you tell Hadoop to use a single reducer, it should produce a single file of output. btw, you do know about Nutch I presume? http://lucene.apache.org/nutch/ This is a distributed IR system built using Hadoop. Miles 2008/7/10 Kylie McCormick <[EMAIL PROTECTED]>: > Hello! > My name is Kylie Mc

Cannot get passwordless ssh to work right

2008-07-10 Thread Jim Lowell
I'm trying to get a 2-node Hadoop cluster up and running on Ubuntu. I've already gotten both nodes to run Hadoop as single-node following the excellent instructions at http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster). Now I'm trying to convert them to a 2-no

newbie in streaming: How to execute a single executable

2008-07-10 Thread Charan Thota
Hi, I'm a newbie in streaming in hadoop. I want to know how to execute a single c++ executable? Should it be a mapper only job? the executable is to cluster a set of points present in a file. so, it cannot be really said to be a mapper or reducer.Also, there is no code present,except for the e

Re: Compiling Word Count in C++ : Hadoop Pipes

2008-07-10 Thread chaitanya krishna
Hi, I faced the similar problem as Sandy. But this time I even had the jdk set properly. when i executed: ant -Dcompile.c++=yes examples the following was displayed: Buildfile: build.xml clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled

Re: Custom InputFormat/OutputFormat

2008-07-10 Thread Runping Qi
All this is because you were using streaming. Streaming treats each line in the stream as one "record" and then break it into a key/value pair (using '\t' as the separator by default). If you write your mapper class in Java, the values passed to the calls to your map function should be the whole te

Re: can you refer me to a User with Hadoop in production

2008-07-10 Thread Johan Oskarsson
There's a number of companies using hadoop in production, listed here: http://wiki.apache.org/hadoop/PoweredBy Bill Boas wrote: Please? Bill Boas VP, Business Development System Fabric Works 510-375-8840 [EMAIL PROTECTED] www.systemfabricworks.com

RE: can you refer me to a User with Hadoop in production

2008-07-10 Thread Ajay Anand
The user group meeting is usually a good place to network with people using Hadoop in production. The next one is on July 22nd. -Original Message- From: Bill Boas [mailto:[EMAIL PROTECTED] Sent: Thursday, July 10, 2008 9:52 AM To: core-user@hadoop.apache.org Subject: can you refer me to

Re: Custom InputFormat/OutputFormat

2008-07-10 Thread Francesco Tamberi
Really thanks, but I still cannot understand why lines after the first one become a key.. why it happens? Shouldn't they be still Value's part?? I implemented a CustomOutputFormat that writes Values only out and I got: first_line_in_text_block EOF I tried outputting Key only and I got: secon

can you refer me to a User with Hadoop in production

2008-07-10 Thread Bill Boas
Please? Bill Boas VP, Business Development System Fabric Works 510-375-8840 [EMAIL PROTECTED] www.systemfabricworks.com

RE: Custom InputFormat/OutputFormat

2008-07-10 Thread Jingkei Ly
I think I see now. Just to recap... you are right that TextOutputFormat outputs Key\tValue\n, which in your case gives: File_position\tText_block\n. But as your Text_block contains '\n' your output actually comes out as: Key Value ---

Re: Custom InputFormat/OutputFormat

2008-07-10 Thread Francesco Tamberi
Ok, I would not like to annoy you but I think I'm missing something.. I have to: - extract relevant text blocks from really big document ( TEXTBLOCK ) - apply some python/c/c++ functions as mappers to text blocks (called via shell script) - output processed text back to text file In order to d

Re: parallel mapping on single server

2008-07-10 Thread hong
Hi Follows Cao Haijun's reply: Suppose we have set 8 map tasks. How does each map know which part of input file it should process? 在 2008-7-10,上午2:33,Haijun Cao 写道: Set number of map slots per tasktracker to 8 in order to run 8 map tasks on one machine (assuming one tasktracker per machin

RE: Custom InputFormat/OutputFormat

2008-07-10 Thread Jingkei Ly
I think I need to understand what you are trying to achieve better, so apologies if these two options don't answer your question fully! 1) If you want to operate on the text in the reducer, then you won't need to make any changes as the data between mapper and reducer is stored as SequenceFiles so

Re: Custom InputFormat/OutputFormat

2008-07-10 Thread Francesco Tamberi
Thank you so much. The problem is that I need to operate on text as is, without modification, and I don't want the filepos to be outputted. There's no way in hadoop to map and to output a block of text containing newline characters? Thank you again, Francesco Jingkei Ly ha scritto: I think yo

Re: How to chain multiple hadoop jobs?

2008-07-10 Thread tim robertson
Have you considered http://www.cascading.org? On Thu, Jul 10, 2008 at 10:44 AM, Amar Kamat <[EMAIL PROTECTED]> wrote: > Deyaa Adranale wrote: > >> I have checked the code JobControl, it submits a set of jobs asyncronously >> and provide methods for checking their status, suspending them, and so o

how to chain multiple jobs in hadoop streamming

2008-07-10 Thread xinfan meng
Does Hadoop suport chaiing multiple jobs with hadoop streaming mechanism? If so, how can I do that? Thanks. -- Best Wishes Meng Xinfan(蒙新泛) Institute of Computational Linguistics Department of Computer Science & Technology School of Electronic Engineering & Computer Science Peking University Beij

Re: Namenode Exceptions with S3

2008-07-10 Thread Steve Loughran
Stuart Sierra wrote: I have Hadoop 0.17.1 and an AWS Secret Key that contains a slash ('/'). With distcp, I found that using the URL format s3://ID:[EMAIL PROTECTED]/ did not work, even if I encoded the slash as "%2F". I got "org.jets3t.service.S3ServiceException: S3 HEAD request failed. Respon

RE: Custom InputFormat/OutputFormat

2008-07-10 Thread Jingkei Ly
I think you need to strip out the newline characters in the value you return, as the TextOutputFormat will treat each newline character as the start of a new record. -Original Message- From: Francesco Tamberi [mailto:[EMAIL PROTECTED] Sent: 09 July 2008 11:27 To: core-user@hadoop.apache.o

!!!Help: Strange difference on the number of maps in HDFS and local file system

2008-07-10 Thread Richard Zhang
Hello Hadoopers: I am trying to running the same map reduce job on HDFS and local file system. That is one time, I run the map reduce job on HDFS and another time I run the same map reduce job with the same input data on local file ext3 system without using HDFS. I found that the number of maps g

Re: Custom InputFormat/OutputFormat

2008-07-10 Thread Francesco Tamberi
Hi all, No one can give me some hint? Please apoligize me but I cannot understand if there's something wrong with my ask. Thak you, Francesco

Re: How to chain multiple hadoop jobs?

2008-07-10 Thread Amar Kamat
Deyaa Adranale wrote: I have checked the code JobControl, it submits a set of jobs asyncronously and provide methods for checking their status, suspending them, and so on. It also supports job dependencies. A particular job can depend on other jobs and hence it supports chaining. *JobControl* a

Re: How to chain multiple hadoop jobs?

2008-07-10 Thread Deyaa Adranale
I have checked the code JobControl, it submits a set of jobs asyncronously and provide methods for checking their status, suspending them, and so on. i think what Mori means by chaining jobs is to execute them after each other, so this class might not help him i have run chained jobs like Mor

Re: FW: [jira] Updated: (HADOOP-3601) Hive as a contrib project

2008-07-10 Thread tim robertson
Thanks Ashish, I am happy to build and try and run from svn/cvs and just try loading in data, querying etc whenever you have something. Cheers Tim On Wed, Jul 9, 2008 at 8:46 PM, Ashish Thusoo <[EMAIL PROTECTED]> wrote: > Hi Tim, > > Point well taken. We are trying to get this out as soon as po