a...i understand...that might be a problem!
well in that case i would need to parse each base64 encoded line for the
'\n' sequence before making any use of it and before adding my own '\n'. i
am quite sure that this could become quite performance consuming which in
turn would reduce the
It's addIputPath, then adds a Path object to the list of inputs.
So doing the filtering first then adding the paths (loop).
But I need an InputFormat anyway because I have my own RecordReader.
At the end I have to put the same logic in a different place. From my
point of view it is better for me
2008-04-15 18:38:41,756 INFO org.mortbay.util.Container: Started
WebApplicationContext[/,/]
2008-04-15 18:38:41,756 INFO org.mortbay.util.Container: Started
HttpContext[/logs,/logs]
2008-04-15 18:38:41,756 INFO org.mortbay.util.Container: Started
HttpContext[/static,/static]
2008-04-15
Is there a way to get a list of files from a specific DataNode in a
programmatic way?
Hi,
I am seeing
08/04/15 08:21:13 INFO mapred.JobClient: Task Id :
task_200804150637_0003_m_00_0, Status : FAILED
java.io.IOException: The directory hdfs://localhost:9000/user/gsi/
20newsOutput/_temporary doesnt exist
at org.apache.hadoop.mapred.TaskTracker
Thanks Ted that worked.
I have one more question.
Now I have the Reduce output is something like this.
K1 v1 v1 v1
K2 v2 v3 v3 v2 v2
I would like to have it in this way
K1 v1(3)
K2 v2(3) v3(2)
Example:
8.14.0.2_12904 371 371 371
1.7.0.1_50098468 468 468 468 371
Hi Grant, could you please copy-paste the exact command you used to run the
program. Also the associated config files, etc. will help
-Original Message-
From: Grant Ingersoll [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 15, 2008 6:03 PM
To: core-user@hadoop.apache.org
Subject:
Hello guys:
I followed the tutorial, but I get the following error
finally:
Could you help me?
Is this
http://hadoop.apache.org/core/docs/current/quickstart.html#SingleNodeSetup
Out of date?
I tried a lot of times, but still get the following problem.
I tried to set up hadoop with cygwin according to the
paper:http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873
But I had problems working with dyndns.I created a new host
there:prerna.dyndns.org
and gave the ip address of it in hadoop-ec2-env.sh as a value of MASTER_HOST.
But
Hello
How can I see browse through the Archive of Hadoop users ? Everytime I try I
get the following message:
Not Found The requested URL /mail/core-user/ was not found on this server.
-
Apache/2.2.8 (Unix) Server at hadoop.apache.org Port 80
between
Yes, it's been like this for around a week or so, nobody has given an
ETA on when it will be fixed I'm afraid. I was told to use the nabble
archive instead:
http://www.nabble.com/Hadoop-core-user-f30590.html
Chaman Singh Verma wrote:
Hello
How can I see browse through the Archive of Hadoop
Hello,
Does anyone have large Weblink graph ? I want to experiment and benchmark
MapReduce with some real dataset.
Thanks,
With regards,
Chaman Singh Verma,
Poona, India
between -00-00 and -99-99
Please include the Mahout sub-project when you report what you find. This
kind of dataset would be very helpful for that project as well.
And you might find something helpful there as well. The goal is to support
machine learning on hadoop.
On 4/15/08 8:29 AM, Chaman Singh Verma [EMAIL
Just count the items in your reducer.
On 4/15/08 6:18 AM, Natarajan, Senthil [EMAIL PROTECTED] wrote:
Thanks Ted that worked.
I have one more question.
Now I have the Reduce output is something like this.
K1 v1 v1 v1
K2 v2 v3 v3 v2 v2
I would like to have it in this way
Another site which has data sets available for study is UCI Machine
Learning Repository:
http://archive.ics.uci.edu/ml/
On Tue, Apr 15, 2008 at 8:29 AM, Chaman Singh Verma [EMAIL PROTECTED] wrote:
Does anyone have large Weblink graph ? I want to experiment and benchmark
MapReduce with
Thanks a lot Andrzej.
csv
Andrzej Bialecki [EMAIL PROTECTED] wrote: Ted Dunning wrote:
Please include the Mahout sub-project when you report what you find. This
kind of dataset would be very helpful for that project as well.
And you might find something helpful there as well. The goal is
Hello,
After googling for many days, I couldn't get one answer from many of the
published reports on Ranking algorithm done by Google. Since Google uses
GFS for fault tolerance purposes, what communication libraries they might be
using to solve such a large matrix ? I presume that standard
Hi, I am new newbie to Hadoop. I would be thankful if you help me.
I've read that I can use the Reporter class to increase counters. this way:
reporter.incrCounter(Enum args, long arg1);
How can I get the values of those counters ?
My aim is to count the total inputs to the mappers , then i
I have a mapper that for each task does extensive computation. In the
computation, I increment a counter once per major operation (about once
every 5 seconds). I can see this happening by the log messages, that
happen around the reporter.incrCounter call.
Still my mapper is getting killed
https://issues.apache.org/jira/browse/HBASE-559 has an example. Ignore
the HBase stuff. Whats important is the ENUM at head of the MR job
class, the calls to Reporter inside in tasks, and the properties file --
both how its named and that it ends up in the generated job jar.
St.Ack
Hello,
I am developing some applications in which I can use the output of Map to
3-4 different Reduce tasks ?
What is the best way to accomplish such task ?
Thanks.
With regards,
csv
--
View this message in context:
I think you just want to set your reduce tasks paramaters in hadoop
streaming to 3 or 4, and make sure that all the other settings wont push it
over 3 or 4..
Why do you want just 3 or 4... have you determined that to be the optimal
number of reduces?
On Tue, Apr 15, 2008 at 11:49 AM, Chaman
I tried to set up hadoop with cygwin according to the
paper:http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873
But I had problems working with dyndns.I created a new host
there:prerna.dyndns.org
and gave the ip address of it in hadoop-ec2-env.sh as a value of MASTER_HOST.
Hello,
After googling for many days, I couldn't get one answer from many of the
published reports on
Ranking algorithm done by Google. Since Google uses GFS for fault tolerance
purposes, what
communication libraries they might be using to solve such a large matrix ? I
presume that standard
Power law algorithms are ideal for this kind of parallelized problem.
The basic idea is that hub and authority style algorithms are intimately
related to eigenvector or singular value decompositions (depending on
whether the links are symmetrical). This also means that there is a close
Hello,
It was wonderful explanation about the role and beauty of eigenvalues in
ranking. But I am still far from the real answer/hint. How Google handle such a
large matrix and solve it ? Do they use MapReduce framework for these process
or adopt standard and reliable Message Passing
On 4/15/08 11:59 AM, Chaman Singh Verma [EMAIL PROTECTED] wrote:
How Google handle such a large matrix and solve it ? Do they use MapReduce
framework for these process or adopt standard and reliable Message Passing
Interface/RPC etc for this
task ?
They use map-reduce.
What about the
Why do you want to do this perverse thing?
How does it help to have more than one datanode per machine? And what in
the world is better when you have 10?
On 4/15/08 12:53 PM, Cagdas Gerede [EMAIL PROTECTED] wrote:
I have a follow-up question,
Is there a way to programatically configure
Testing when I do not have 10 machines.
On 4/15/08, Ted Dunning [EMAIL PROTECTED] wrote:
Why do you want to do this perverse thing?
How does it help to have more than one datanode per machine? And what in
the world is better when you have 10?
On 4/15/08 12:53 PM, Cagdas Gerede [EMAIL
You need ssh working properly to continue. It sounds like the ssh server
isn't listening on port 22. Have you configured it using ssh-host-config?
(this is Cygwin-specific) See the 'Windows Users' section on
http://wiki.apache.org/hadoop/QuickStart.
On Tue, Apr 15, 2008 at 3:28 PM, Prerna
I am working on Distributed File System part. I do not use MR part,
and I need to run multiple processes to test some scenarios on the file
system.
On Tue, Apr 15, 2008 at 1:37 PM, Ted Dunning [EMAIL PROTECTED] wrote:
I have had no issues in scaling the number of datanodes. The location of
Why do you want to do this perverse thing?
-agreed.
It sounds like even in your testing that you'll not really get the full
effect of what you're wanting to test. When you have two installations on
the same machine it's likely that the network latency and other issues that
occur when
And the two instances will affect each other significantly so that they will
tend to serialize.
On 4/15/08 3:24 PM, Theodore Van Rooy [EMAIL PROTECTED] wrote:
Why do you want to do this perverse thing?
-agreed.
It sounds like even in your testing that you'll not really get the full
hey
I am working with the EC2 environment.
I registered and am being billed for EC2 and S3.
Right now I have two cygwin windows open.
1 is as an administrator-server(on which sshd running) in which I have
a separate folder for hadoop files
and am able to do bin/hadoop
1 as a normal user-client.
Have you seen the book Google's PageRank and Beyond? :)
they might be using MapReduce ...
I don't think Map/Reduce is a advanced parallel computing model, but
i'm agree with you.
Have you seen the Hama proposal? (http://wiki.apache.org/incubator/HamaProposal)
I'll presentation ideas about Hama
Well, on deeper reading of the code and the documentation,
reporter.progress(), is the required call.
Jason Venner wrote:
I have a mapper that for each task does extensive computation. In the
computation, I increment a counter once per major operation (about
once every 5 seconds). I can see
Hello,
Does anyone have any experience adding nodes to a cluster running on
EC2? If so, is there some documentation on how to do this?
Thanks,
-stephen
Stephen
Check out the patch in Hadoop-2410 to the contrib/ec2 scripts
https://issues.apache.org/jira/browse/HADOOP-2410
(just grab the ec2.tgz attachment)
these scripts allow you do dynamically grow your cluster plus some
extra goodies. you will need to use them to build your own ami, they
Thank you. I will check that out. I haven't built an AMI before.
Hopefully it isn't too complicated, as it is easy to use the pre-built
AMI's.
-stephen
Chris K Wensel wrote:
Stephen
Check out the patch in Hadoop-2410 to the contrib/ec2 scripts
Just put it in the classpath
On Tue, Apr 15, 2008 at 11:50 PM, Natarajan, Senthil [EMAIL PROTECTED]
wrote:
Hi,
How to read configuration file in Hadoop.
I tried by copying the file in HDFS and also placing within the jar file.
I tried like this in Map constructor
Configuration conf = new
Natarajan, Senthil wrote:
Hi,
How to read configuration file in Hadoop.
I tried by copying the file in HDFS and also placing within the jar file.
Do you intend to read the job's config file or a separate file? In case
of accessing the job specific config, overload the configure(JobConf)
I'm unsure of your particular problem.
but the scripts/patch I referenced previously remove any dependency on
DynDNS.
the recipe would be something like...
make a s3 bucket and update hadoop-ec2-env.sh
make an image:
hadoop-ec2 create-image
make a 2 node (3 machine) cluster:
hadoop-ec2
42 matches
Mail list logo