date:20080415

RE: streaming + binary input/output data?

2008-04-15 Thread John Menzer

a...i understand...that might be a problem! well in that case i would need to parse each base64 encoded line for the '\n' sequence before making any use of it and before adding my own '\n'. i am quite sure that this could become quite performance consuming which in turn would reduce the

Re: [HADOOP-users] HowTo filter files for a Map/Reduce task over the same input folder

2008-04-15 Thread Alfonso Olias Sanz

It's addIputPath, then adds a Path object to the list of inputs. So doing the filtering first then adding the paths (loop). But I need an InputFormat anyway because I have my own RecordReader. At the end I have to put the same logic in a different place. From my point of view it is better for me

NameNode failed to start:port out of range:-1

2008-04-15 Thread 徐强

2008-04-15 18:38:41,756 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/] 2008-04-15 18:38:41,756 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs] 2008-04-15 18:38:41,756 INFO org.mortbay.util.Container: Started HttpContext[/static,/static] 2008-04-15

Fwd: Getting a DataNode files list

2008-04-15 Thread Shimi K

Is there a way to get a list of files from a specific DataNode in a programmatic way?

_temporary doesn't exist

2008-04-15 Thread Grant Ingersoll

Hi, I am seeing 08/04/15 08:21:13 INFO mapred.JobClient: Task Id : task_200804150637_0003_m_00_0, Status : FAILED java.io.IOException: The directory hdfs://localhost:9000/user/gsi/ 20newsOutput/_temporary doesnt exist at org.apache.hadoop.mapred.TaskTracker

RE: Reduce Output

2008-04-15 Thread Natarajan, Senthil

Thanks Ted that worked. I have one more question. Now I have the Reduce output is something like this. K1 v1 v1 v1 K2 v2 v3 v3 v2 v2 I would like to have it in this way K1 v1(3) K2 v2(3) v3(2) Example: 8.14.0.2_12904 371 371 371 1.7.0.1_50098468 468 468 468 371

RE: _temporary doesn't exist

2008-04-15 Thread Devaraj Das

Hi Grant, could you please copy-paste the exact command you used to run the program. Also the associated config files, etc. will help -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 15, 2008 6:03 PM To: core-user@hadoop.apache.org Subject:

jobtracker can be started but NameNode failed to startup.java.lang.IllegalArgumentException: port out of range:-1

2008-04-15 Thread Skater

Hello guys: I followed the tutorial, but I get the following error finally: Could you help me? Is this http://hadoop.apache.org/core/docs/current/quickstart.html#SingleNodeSetup Out of date? I tried a lot of times, but still get the following problem.

Query

2008-04-15 Thread Prerna Manaktala

I tried to set up hadoop with cygwin according to the paper:http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873 But I had problems working with dyndns.I created a new host there:prerna.dyndns.org and gave the ip address of it in hadoop-ec2-env.sh as a value of MASTER_HOST. But

Re: Archive

2008-04-15 Thread Adrian Woodhead

Yes, it's been like this for around a week or so, nobody has given an ETA on when it will be fixed I'm afraid. I was told to use the nabble archive instead: http://www.nabble.com/Hadoop-core-user-f30590.html Chaman Singh Verma wrote: Hello How can I see browse through the Archive of Hadoop

Large Weblink Graph

2008-04-15 Thread Chaman Singh Verma

Hello, Does anyone have large Weblink graph ? I want to experiment and benchmark MapReduce with some real dataset. Thanks, With regards, Chaman Singh Verma, Poona, India between -00-00 and -99-99

Re: Large Weblink Graph

2008-04-15 Thread Ted Dunning

Please include the Mahout sub-project when you report what you find. This kind of dataset would be very helpful for that project as well. And you might find something helpful there as well. The goal is to support machine learning on hadoop. On 4/15/08 8:29 AM, Chaman Singh Verma [EMAIL

Re: Reduce Output

2008-04-15 Thread Ted Dunning

Just count the items in your reducer. On 4/15/08 6:18 AM, Natarajan, Senthil [EMAIL PROTECTED] wrote: Thanks Ted that worked. I have one more question. Now I have the Reduce output is something like this. K1 v1 v1 v1 K2 v2 v3 v3 v2 v2 I would like to have it in this way

Re: Large Weblink Graph

2008-04-15 Thread Paco NATHAN

Another site which has data sets available for study is UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/ On Tue, Apr 15, 2008 at 8:29 AM, Chaman Singh Verma [EMAIL PROTECTED] wrote: Does anyone have large Weblink graph ? I want to experiment and benchmark MapReduce with

Re: Large Weblink Graph

2008-04-15 Thread Chaman Singh Verma

Thanks a lot Andrzej. csv Andrzej Bialecki [EMAIL PROTECTED] wrote: Ted Dunning wrote: Please include the Mahout sub-project when you report what you find. This kind of dataset would be very helpful for that project as well. And you might find something helpful there as well. The goal is

Page Ranking, Hadoop And MPI.

2008-04-15 Thread Chaman Singh Verma

Hello, After googling for many days, I couldn't get one answer from many of the published reports on Ranking algorithm done by Google. Since Google uses GFS for fault tolerance purposes, what communication libraries they might be using to solve such a large matrix ? I presume that standard

How can I use counters in Hadoop

2008-04-15 Thread CloudyEye

Hi, I am new newbie to Hadoop. I would be thankful if you help me. I've read that I can use the Reporter class to increase counters. this way: reporter.incrCounter(Enum args, long arg1); How can I get the values of those counters ? My aim is to count the total inputs to the mappers , then i

Question about reporting progress in mapper tasks. 0.15.3

2008-04-15 Thread Jason Venner

I have a mapper that for each task does extensive computation. In the computation, I increment a counter once per major operation (about once every 5 seconds). I can see this happening by the log messages, that happen around the reporter.incrCounter call. Still my mapper is getting killed

Re: How can I use counters in Hadoop

2008-04-15 Thread stack

https://issues.apache.org/jira/browse/HBASE-559 has an example. Ignore the HBase stuff. Whats important is the ENUM at head of the MR job class, the calls to Reporter inside in tasks, and the properties file -- both how its named and that it ends up in the generated job jar. St.Ack

MapReduce: Two Reduce Tasks

2008-04-15 Thread Chaman

Hello, I am developing some applications in which I can use the output of Map to 3-4 different Reduce tasks ? What is the best way to accomplish such task ? Thanks. With regards, csv -- View this message in context:

Re: MapReduce: Two Reduce Tasks

2008-04-15 Thread Theodore Van Rooy

I think you just want to set your reduce tasks paramaters in hadoop streaming to 3 or 4, and make sure that all the other settings wont push it over 3 or 4.. Why do you want just 3 or 4... have you determined that to be the optimal number of reduces? On Tue, Apr 15, 2008 at 11:49 AM, Chaman

Urgent

2008-04-15 Thread Prerna Manaktala

I tried to set up hadoop with cygwin according to the paper:http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873 But I had problems working with dyndns.I created a new host there:prerna.dyndns.org and gave the ip address of it in hadoop-ec2-env.sh as a value of MASTER_HOST.

EigenValue Calculations, Hadoop and MPI.

2008-04-15 Thread Chaman Singh Verma

Hello, After googling for many days, I couldn't get one answer from many of the published reports on Ranking algorithm done by Google. Since Google uses GFS for fault tolerance purposes, what communication libraries they might be using to solve such a large matrix ? I presume that standard

Re: Page Ranking, Hadoop And MPI.

2008-04-15 Thread Ted Dunning

Power law algorithms are ideal for this kind of parallelized problem. The basic idea is that hub and authority style algorithms are intimately related to eigenvector or singular value decompositions (depending on whether the links are symmetrical). This also means that there is a close

Re: Page Ranking, Hadoop And MPI.

2008-04-15 Thread Chaman Singh Verma

Hello, It was wonderful explanation about the role and beauty of eigenvalues in ranking. But I am still far from the real answer/hint. How Google handle such a large matrix and solve it ? Do they use MapReduce framework for these process or adopt standard and reliable Message Passing

Re: Page Ranking, Hadoop And MPI.

2008-04-15 Thread Ted Dunning

On 4/15/08 11:59 AM, Chaman Singh Verma [EMAIL PROTECTED] wrote: How Google handle such a large matrix and solve it ? Do they use MapReduce framework for these process or adopt standard and reliable Message Passing Interface/RPC etc for this task ? They use map-reduce. What about the

Re: multiple datanodes in the same machine

2008-04-15 Thread Ted Dunning

Why do you want to do this perverse thing? How does it help to have more than one datanode per machine? And what in the world is better when you have 10? On 4/15/08 12:53 PM, Cagdas Gerede [EMAIL PROTECTED] wrote: I have a follow-up question, Is there a way to programatically configure

Re: multiple datanodes in the same machine

2008-04-15 Thread cagdas . gerede

Testing when I do not have 10 machines. On 4/15/08, Ted Dunning [EMAIL PROTECTED] wrote: Why do you want to do this perverse thing? How does it help to have more than one datanode per machine? And what in the world is better when you have 10? On 4/15/08 12:53 PM, Cagdas Gerede [EMAIL

Re: Urgent

2008-04-15 Thread Norbert Burger

You need ssh working properly to continue. It sounds like the ssh server isn't listening on port 22. Have you configured it using ssh-host-config? (this is Cygwin-specific) See the 'Windows Users' section on http://wiki.apache.org/hadoop/QuickStart. On Tue, Apr 15, 2008 at 3:28 PM, Prerna

Re: multiple datanodes in the same machine

2008-04-15 Thread Cagdas Gerede

I am working on Distributed File System part. I do not use MR part, and I need to run multiple processes to test some scenarios on the file system. On Tue, Apr 15, 2008 at 1:37 PM, Ted Dunning [EMAIL PROTECTED] wrote: I have had no issues in scaling the number of datanodes. The location of

Re: multiple datanodes in the same machine

2008-04-15 Thread Theodore Van Rooy

Why do you want to do this perverse thing? -agreed. It sounds like even in your testing that you'll not really get the full effect of what you're wanting to test. When you have two installations on the same machine it's likely that the network latency and other issues that occur when

Re: multiple datanodes in the same machine

2008-04-15 Thread Ted Dunning

And the two instances will affect each other significantly so that they will tend to serialize. On 4/15/08 3:24 PM, Theodore Van Rooy [EMAIL PROTECTED] wrote: Why do you want to do this perverse thing? -agreed. It sounds like even in your testing that you'll not really get the full

Re: Urgent

2008-04-15 Thread Prerna Manaktala

hey I am working with the EC2 environment. I registered and am being billed for EC2 and S3. Right now I have two cygwin windows open. 1 is as an administrator-server(on which sshd running) in which I have a separate folder for hadoop files and am able to do bin/hadoop 1 as a normal user-client.

Re: EigenValue Calculations, Hadoop and MPI.

2008-04-15 Thread Edward J. Yoon

Have you seen the book Google's PageRank and Beyond? :) they might be using MapReduce ... I don't think Map/Reduce is a advanced parallel computing model, but i'm agree with you. Have you seen the Hama proposal? (http://wiki.apache.org/incubator/HamaProposal) I'll presentation ideas about Hama

Re: Question about reporting progress in mapper tasks. 0.15.3 - solved

2008-04-15 Thread Jason Venner

Well, on deeper reading of the code and the documentation, reporter.progress(), is the required call. Jason Venner wrote: I have a mapper that for each task does extensive computation. In the computation, I increment a counter once per major operation (about once every 5 seconds). I can see

adding nodes to an EC2 cluster

2008-04-15 Thread Stephen J. Barr

Hello, Does anyone have any experience adding nodes to a cluster running on EC2? If so, is there some documentation on how to do this? Thanks, -stephen

Re: adding nodes to an EC2 cluster

2008-04-15 Thread Chris K Wensel

Stephen Check out the patch in Hadoop-2410 to the contrib/ec2 scripts https://issues.apache.org/jira/browse/HADOOP-2410 (just grab the ec2.tgz attachment) these scripts allow you do dynamically grow your cluster plus some extra goodies. you will need to use them to build your own ami, they

Re: adding nodes to an EC2 cluster

2008-04-15 Thread Stephen J. Barr

Thank you. I will check that out. I haven't built an AMI before. Hopefully it isn't too complicated, as it is easy to use the pre-built AMI's. -stephen Chris K Wensel wrote: Stephen Check out the patch in Hadoop-2410 to the contrib/ec2 scripts

Re: Reading Configuration File

2008-04-15 Thread Shimi K

Just put it in the classpath On Tue, Apr 15, 2008 at 11:50 PM, Natarajan, Senthil [EMAIL PROTECTED] wrote: Hi, How to read configuration file in Hadoop. I tried by copying the file in HDFS and also placing within the jar file. I tried like this in Map constructor Configuration conf = new

Re: Reading Configuration File

2008-04-15 Thread Amar Kamat

Natarajan, Senthil wrote: Hi, How to read configuration file in Hadoop. I tried by copying the file in HDFS and also placing within the jar file. Do you intend to read the job's config file or a separate file? In case of accessing the job specific config, overload the configure(JobConf)

Re: adding nodes to an EC2 cluster

2008-04-15 Thread Chris K Wensel

I'm unsure of your particular problem. but the scripts/patch I referenced previously remove any dependency on DynDNS. the recipe would be something like... make a s3 bucket and update hadoop-ec2-env.sh make an image: hadoop-ec2 create-image make a 2 node (3 machine) cluster: hadoop-ec2

42 matches

Mail list logo