Hadoop streaming is the utility allows you to create and run Map/Reduce jobs
with any executable or script as the mapper and/or the reducer. I'm not
familiar with it, but I think you can find something useful here
http://hadoop.apache.org/common/docs/current/streaming.html
2009/8/19 Poole, Samuel
On Aug 18, 2009, at 2:40 PM, Poole, Samuel [USA] wrote:
I am new to Hadoop (I have not yet installed/configured), and I want
to make sure that I have the correct tool for the job. I do not
currently have a need for the Map/Reduce functionality, but I am
interested in using Hadoop for
Hi!
I'm currently evaluating different Hadoop versions for a new project.
I'm tempted by the Cloudera distribution, since it's neatly packaged
into .deb files, and is the stable distribution but with some patches
applied, for example the bzip2 support.
I understand that I can get a support
On Wed, Aug 19, 2009 at 4:26 PM, Erik Forsberg forsb...@opera.com wrote:
I understand that I can get a support agreement from Cloudera to match
this distribution, but if that's not an option, will running the
Cloudera distribution put me in a position where I won't get any help
from the
I'm sorry, the version is 0.19.1
2009/8/19 Ted Dunning ted.dunn...@gmail.com
Which version of hadoop are you running?
On Tue, Aug 18, 2009 at 10:23 PM, yang song hadoop.ini...@gmail.com
wrote:
Hello, all
I have met the problem too many fetch failures when I submit a big
job(e.g.
Generally if I have an issue I will bring it up on the forums and just
reference the hadoop major.18.3 ce your likely to get the same level
of help.
On 8/19/09, Erik Forsberg forsb...@opera.com wrote:
Hi!
I'm currently evaluating different Hadoop versions for a new project.
I'm tempted by the
Cloudera submits their patches back to the projects, and people are free to
pick them up.
It is becoming a normal thing to run a patched distribution, particularly
since Yahoo made their version of 0.20 available.
On Wed, Aug 19, 2009 at 5:46 AM, Edward Capriolo edlinuxg...@gmail.comwrote:
I'm importing a bunch of data into HDSF. It involves running a bunch
of small jobs, that don't put much load on my cluster, but it would be
nice if I could do them all from the same job client. I'd submit them
all asynchronously and then wait for the results of each.
I imagine this has been
I am not saying there is a slowdown cause by hadoop. I was wondering if
there were anyother techinques that optimize speed (IE reading a little a
time and writing to the local disk).
Ananth T Sarathy
On Wed, Aug 19, 2009 at 1:26 AM, Raghu Angadi rang...@yahoo-inc.com wrote:
Ananth T. Sarathy
Ananth,
That is your issue really.
For example. I have 20 web servers and I wish to download all the
weblogs from all of them into hadoop.
If you write a top down program that uses FSDataOutput. You are using
hadoop half way. You are using the distributed file system, but you
are not doing any
I'd dig around a bit more to check if it's there it's caused by a
specific set of nodes... i.e. are maps on specific tasktrackers
failing in this manner?
Arun
On Aug 18, 2009, at 10:23 PM, yang song wrote:
Hello, all
I have met the problem too many fetch failures when I submit a
big
Hello, we were importing several TB of data overnight and it seemed one
of the loads
failed. We're running Hadoop 0.18.3, and there are 6 nodes in the
cluster, all are
dual quad core with 6 gigs of ram. We were using hadoop dfs -put to
load the data
from both the namenode server and the
George-
You can certainly submit jobs asynchronously via the
JobClient.submitJob() method
(http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/JobClient.html).
This will return a handle (a RunningJob instance) that you can poll
for completion. This is what the
On 8/19/09 10:58 AM, Raghu Angadi rang...@yahoo-inc.com wrote:
Edward Capriolo wrote:
On Wed, Aug 19, 2009 at 11:11 AM, Edward Capriolo
edlinuxg...@gmail.comwrote:
It would be as fast as underlying filesystem goes.
I would not agree with that statement. There is overhead.
You might be
Hi Mithila,
In the Mapreduce svn tree, it's under src/contrib/fairscheduler/
- Aaron
On Wed, Aug 19, 2009 at 2:48 PM, Mithila Nagendra mnage...@asu.edu wrote:
Hello
I was wondering how I could locate the source code files for the fair
scheduler.
Thanks
Mithila
Has anybody had any luck setting up the log4j.properties file to send logs
to a syslog-ng server?
My log4j.properties excerpt:
log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender
log4j.appender.SYSLOG.syslogHost=10.0.20.164
log4j.appender.SYSLOG.layout=org.apache.log4j.PatternLayout
Hi Inifok,
This is a confusing aspect of Hadoop, I'm afraid.
Settings are divided into two categories: per-job and per-node.
Unfortunately, which are which, isn't documented.
Some settings are applied to the node that is being used. So for example, if
you set fs.default.name on a node to be
Hello everyone,
could anyone please tell me in which class and which method does Hadoop
download the file chunk from HDFS and associate it with the thread that
executes the Map function on given chunk and process it?
I would like to extend the Hadoop so one Task may have more chunks
associated and
Thanks! But How do I know which version to work with?
Mithila
On Thu, Aug 20, 2009 at 2:30 AM, Ravi Phulari rphul...@yahoo-inc.comwrote:
Currently Fairscheduler source is in
hadoop-mapreduce/src/contrib/fairscheduler/
Download mapreduce source from.
Thank you, Aaron. I've benefited a lot. per-node means some settings
associated with the node. e.g., fs.default.name, mapred.job.tracker,
etc. per-job means some settings associated with the jobs which are
submited from the node. e.g., mapred.reduce.tasks. That means, if I set
per-job
Hey Mike,
Yup. We find the stock log4j needs two things:
1) Set the rootLogger manually. The way 0.19.x has the root logger
set up breaks when adding new appenders. I.e., do:
log4j.rootLogger=INFO,SYSLOG,console,DRFA,EventCounter
2) Add the headers; otherwise log4j is not compatible
21 matches
Mail list logo