Hi Amit,
It is a bug, fixed by
https://issues.apache.org/jira/browse/HADOOP-6103, although the fix
never made it into branch-1. Can you create a branch-1 patch for this
please?
Thanks,
Tom
On Thu, Apr 18, 2013 at 4:09 AM, Amit Sela am...@infolinks.com wrote:
Hi all,
I was wondering if there
Hi Nitin,
It looks like you may be using the wrong port number - try 8088 for
the resource manager UI.
Cheers,
Tom
On Mon, Nov 28, 2011 at 4:02 AM, Nitin Khandelwal
nitin.khandel...@germinait.com wrote:
Hi,
I was trying to setup Hadoop 0.23.0 with help of
Justin,
The skipping feature should really only be used when you are calling
out to a third-party library that may segfault on corrupt data, and
even then it's probably better to use a subprocess to handles it, as
Owen suggested here:
On Thu, Oct 13, 2011 at 2:06 PM, Raimon Bosch raimon.bo...@gmail.com wrote:
By the way,
The url I'm trying has a '_' in the bucket name. Could be this the problem?
Yes, underscores are not permitted in hostnames.
Cheers,
Tom
2011/10/13 Raimon Bosch raimon.bo...@gmail.com
Hi,
I've been
JobConf and the old API are no longer deprecated in the forthcoming
0.20.205 release, so you can continue to use it without issue.
The equivalent in the new API is setInputFormatClass() on
org.apache.hadoop.mapreduce.Job.
Cheers,
Tom
On Tue, Oct 11, 2011 at 9:18 AM, Keith Thompson
You might consider Apache Whirr (http://whirr.apache.org/) for
bringing up Hadoop clusters on EC2.
Cheers,
Tom
On Wed, Aug 31, 2011 at 8:22 AM, Robert Evans ev...@yahoo-inc.com wrote:
Dmitry,
It sounds like an interesting idea, but I have not really heard of anyone
doing it before. It
See also https://issues.apache.org/jira/browse/MAPREDUCE-434 which has
a patch for this issue.
Cheers,
Tom
On Mon, May 2, 2011 at 5:13 PM, jason urg...@gmail.com wrote:
I am attaching the originals so you could figure out the diffs on your own :)
On 5/2/11, Dmitriy Lyubimov dlie...@gmail.com
Hi Witold,
Is this on Windows? The scripts were re-structured after Hadoop 0.20,
and looking at them now I notice that the cygwin path translation for
the classpath seems to be missing. You could try adding the following
line to the if $cygwin clause in bin/hadoop-config.sh:
CLASSPATH=`cygpath
The instructions at
http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html should be
what you need.
Cheers,
Tom
On Wed, Mar 2, 2011 at 12:59 AM, Manish Yadav manish.ya...@orkash.com wrote:
Dear Sir/Madam
I'm very new to hadoop. I'm trying to install hadoop on my computer. I
followed a
These files are generated files. If you run ant avro-generate
eclipse then Eclipse should file these files.
Cheers,
Tom
On Mon, Feb 28, 2011 at 2:43 AM, bharath vissapragada
bharathvissapragada1...@gmail.com wrote:
Hi all,
I checked out the map-reduce trunk a few days back and following
Hi Steve,
Sorry to hear about the problems you had. The issue you hit was a
result of MAPREDUCE-954, and there was some discussion on that JIRA
about compatibility. I believe the thinking was that the context
classes are framework classes, so users don't extend/implement them in
the normal course
On Thu, Oct 21, 2010 at 8:23 AM, ed hadoopn...@gmail.com wrote:
Hello,
The MapRunner classes looks promising. I noticed it is in the deprecated
mapred package but I didn't see an equivalent class in the mapreduce
package. Is this going to ported to mapreduce or is it no longer being
It's done by the RecordReader. For text-based input formats, which use
LineRecordReader, decompression is carried out automatically. For
others it's not (e.g. sequence files which have internal compression).
So it depends on what your custom input format does.
Cheers,
Tom
On Fri, Oct 8, 2010 at
Hi Henning,
I don't know if you've seen
https://issues.apache.org/jira/browse/MAPREDUCE-1938 and
https://issues.apache.org/jira/browse/MAPREDUCE-1700 which have
discussion about this issue.
Cheers
Tom
On Fri, Sep 24, 2010 at 3:41 AM, Henning Blohm henning.bl...@zfabrik.de wrote:
Short update
not find any
tutorial or examples anywhere.
Martin
On 22.09.2010 18:29, Tom White wrote:
Note that JobClient, along with the rest of the old API in
org.apache.hadoop.mapred, has been undeprecated in Hadoop 0.21.0 so
you can continue to use it without warnings.
Tom
On Wed, Sep 22, 2010 at 2
the Tool interface is located. Could this be
the problem? I am a little clueless here and not sure whether this is a
problem that should be further addressed in this mailing list.
Thanks in advance,
Martin
On 22.09.2010 16:08, Tom White wrote:
Hi Martin,
Neither Tool nor ToolRunner
Hi Martin,
This is a known bug, see https://issues.apache.org/jira/browse/HADOOP-6953.
Cheers
Tom
On Wed, Sep 22, 2010 at 8:17 AM, Martin Becker _martinbec...@web.de wrote:
Hi,
I am using Hadoop MapReduce 0.21.0. The usual process of starting
Hadoop/HDFS/MapReduce was to use the
dar...@darose.net wrote:
Hmmm. Any idea as to why the undeprecation? I thought the intention was to
try to move everybody to the new API. Why the reversal?
Thanks,
DR
On 09/22/2010 12:29 PM, Tom White wrote:
Note that JobClient, along with the rest of the old API
On 22.09.2010 18:29, Tom White wrote:
Note that JobClient, along with the rest of the old API in
org.apache.hadoop.mapred, has been undeprecated in Hadoop 0.21.0 so
you can continue to use it without warnings.
Tom
On Wed, Sep 22, 2010 at 2:43 AM, Amareshwari Sri Ramadasu
amar...@yahoo-inc.com wrote
Hi Mike,
What do you get if you type ./hadoop classpath? Does it contain the
Hadoop common JAR?
To avoid the deprecation warning you should use hadoop fs, not hadoop dfs.
Tom
On Wed, Sep 15, 2010 at 12:53 PM, Mike Franon kongfra...@gmail.com wrote:
Hi,
I just setup 3 node hadoop cluster
Hi Sonal,
The 0.21.0 jars are not available in Maven yet, since the process for
publishing them post split has changed.
See HDFS-1292 and MAPREDUCE-1929.
Cheers,
Tom
On Fri, Sep 10, 2010 at 1:33 PM, Sonal Goyal sonalgoy...@gmail.com wrote:
Hi,
Can someone please point me to the Maven repo
The 0.21.0 jars are not in the Apache Maven repos yet, since the
process for publishing them post split has changed. HDFS-1292 and
MAPREDUCE-1929 are the tickets to fix this.
Cheers,
Tom
On Sat, Aug 28, 2010 at 9:10 PM, Mark static.void@gmail.com wrote:
On 8/27/10 9:25 AM, Owen O'Malley
Hi everyone,
I am pleased to announce that Apache Hadoop 0.21.0 is available for
download from http://hadoop.apache.org/common/releases.html.
Over 1300 issues have been addressed since 0.20.2; you can find details at
http://hadoop.apache.org/common/docs/r0.21.0/releasenotes.html
On Mon, Aug 16, 2010 at 3:21 PM, David Rosenstrauch dar...@darose.net wrote:
On 08/16/2010 05:48 PM, Ted Yu wrote:
No.
On Mon, Aug 16, 2010 at 1:25 PM, David
Rosenstrauchdar...@darose.netwrote:
Is it possible for a M/R job to have no mapper? i.e.:
job.setMapperClass(null)? Or is it
Hi Oleg,
I don't know of any plans to implement this. However, since this is a
block-based storage system which uses S3, I wonder whether an
implementation could use some of the logic in HDFS for block storage
and append in general.
Cheers,
Tom
On Thu, Aug 12, 2010 at 8:34 AM, Aleshko, Oleg
Hi Felix,
Aaron Kimball hit the same problem - it's being discussed at
https://issues.apache.org/jira/browse/MAPREDUCE-1920.
Thanks for reporting this.
Cheers,
Tom
On Tue, Jul 6, 2010 at 11:26 AM, Felix Halim felix.ha...@gmail.com wrote:
I tried hadoop 0.21 release candidate.
Hi Ananth,
The next release of Hadoop will be 0.21.0, but it won't have Kerberos
authentication in it (since it's not all in trunk yet). The 0.22.0
release later this year will have a working version of security in it.
Cheers,
Tom
On Wed, Jul 7, 2010 at 8:09 AM, Ananth Sarathy
Hi Mark,
You can find the latest version of the scripts at
http://archive.cloudera.com/cdh/3/hadoop-0.20.2+228.tar.gz.
Documentation is at http://archive.cloudera.com/docs/ec2.html.
The source code is currently in src/contrib/cloud in Hadoop Common,
but is in the process of moving to a new
Hi Susanne,
Hadoop uses the file extension to detect that a file is compressed. I
believe Hive does too. Did you store the compressed file in HDFS with
a .gz extension?
Cheers,
Tom
BTW It's best to send Hive questions like these to the hive-user@ list.
On Sun, May 2, 2010 at 11:22 AM, Susanne
Hi Yuanyuan,
I think you've found a bug - could you file a JIRA issue for this please?
Thanks,
Tom
On Wed, Apr 28, 2010 at 11:04 PM, Yuanyuan Tian yt...@us.ibm.com wrote:
I have a problem in getting the input file name in the mapper when uisng
MultipleInputs. I need to use MultipleInputs
mapper)?
Yuanyuan
Tom White ---04/29/2010 09:42:44 AM---Hi Yuanyuan, I think you've found a bug
- could you file a JIRA issue for this please?
From:
Tom White t...@cloudera.com
To:
common-user@hadoop.apache.org
Date:
04/29/2010 09:42 AM
Subject:
Re: conf.get(map.input.file) returns
Hi Danny,
S3FileSystem has no concept of permissions, which is why this check
fails. The change that introduced the permissions check was introduced
in https://issues.apache.org/jira/browse/MAPREDUCE-181. Could you file
a bug for this please?
Cheers,
Tom
On Thu, Apr 22, 2010 at 4:16 AM, Danny
I think you can set the URI on the configuration object with the key
JobContext.END_NOTIFICATION_URL.
Cheers,
Tom
On Tue, Feb 23, 2010 at 12:02 PM, Ted Yu yuzhih...@gmail.com wrote:
Hi,
I am looking for counterpart to JobConf.setJobEndNotificationURI() in
org.apache.hadoop.mapreduce
Please
Hi Sonal,
You should use the one with the later date. The Cloudera AMIs don't
actually have Hadoop installed on them, just Java and some other base
packages. Hadoop is installed at start up time; you can find more
information at http://archive.cloudera.com/docs/ec2.html.
Cheers,
Tom
P.S. For
Please submit a patch for the documentation change - perhaps at
https://issues.apache.org/jira/browse/HADOOP-5973.
Cheers,
Tom
On Wed, Jan 13, 2010 at 12:09 AM, Amogh Vasekar am...@yahoo-inc.com wrote:
+1 for the documentation change in mapred-tutorial. Can we do that and
publish using a
Have a look at org.apache.hadoop.io.ArrayWritable. You may be able to
use this class in your application, or at least use it as a basis for
writing VectorWritable.
Cheers,
Tom
On Tue, Dec 29, 2009 at 1:37 AM, bharath v
bharathvissapragada1...@gmail.com wrote:
Can you please tell me , what is
If you are using S3 as your file store then you don't need to run HDFS
(and indeed HDFS will not start up if you try).
Cheers,
Tom
2009/12/17 Rekha Joshi rekha...@yahoo-inc.com:
Not sure what the whole error is, but you can always alternatively try this -
property
namefs.default.name/name
Correct. The master runs the namenode and jobtracker, but not a
datanode or tasktracker.
Tom
On Tue, Nov 24, 2009 at 4:57 PM, Mark Kerzner markkerz...@gmail.com wrote:
Hi,
do I understand it correctly that, when I launch a Hadoop cluster on EC2,
the master will not be doing any work, and it
that of the slaves?
No, this is not supported, but I can see it would be useful,
particularly for larger clusters. Please consider opening a JIRA for
it.
Cheers,
Tom
Thank you,
Mark
On Tue, Nov 24, 2009 at 11:20 PM, Tom White t...@cloudera.com wrote:
Mark,
If the data was transferred to S3
Mark,
If the data was transferred to S3 outside of Hadoop then you should
use the s3n filesystem scheme (see the explanation on
http://wiki.apache.org/hadoop/AmazonS3 for the differences between the
Hadoop S3 filesystems).
Also, some people have had problems embedding the secret key in the
URI,
Hi Mark,
HADOOP-6108 will add Cloudera's EC2 scripts to the Apache
distribution, with the difference that they will run Apache Hadoop.
The same scripts will also support Cloudera's Distribution for Hadoop,
simply by using a different boot script on the instances. So I would
suggest you use these
,
Mark
On Sun, Nov 15, 2009 at 10:29 PM, Tom White t...@cloudera.com wrote:
Hi Mark,
HADOOP-6108 will add Cloudera's EC2 scripts to the Apache
distribution, with the difference that they will run Apache Hadoop.
The same scripts will also support Cloudera's Distribution for Hadoop,
simply
Multiple outputs has been ported to the new API in 0.21. See
https://issues.apache.org/jira/browse/MAPREDUCE-370.
Cheers,
Tom
On Sat, Nov 7, 2009 at 6:45 AM, Xiance SI(司宪策) adam...@gmail.com wrote:
I just fall back to old mapred.* APIs, seems MultipleOutputs only works for
the old API.
MultipleInputs is available from Hadoop 0.19 onwards (in
org.apache.hadoop.mapred.lib, or org.apache.hadoop.mapreduce.lib.input
for the new API in later versions).
Tom
On Wed, Nov 4, 2009 at 8:07 AM, Mark Vigeant
mark.vige...@riskmetrics.com wrote:
Amogh,
That sounds so awesome! Yeah I wish I
Hi Mark,
Sorry to hear that all your EC2 instances were terminated. Needless to
say, this should certainly not happen.
The scripts are a Python rewrite (see HADOOP-6108) of the bash ones so
HADOOP-1504 is not applicable, but the behaviour should be the same:
the terminate-cluster command lists
the Hadoop cluster default, and
make sure that you don't create non-Hadoop EC2 instances in the
cluster group.
Thanks,
Tom
Does this help at all? Thanks.
-Mark
On Mon, Oct 19, 2009 at 11:52 AM, Tom White t...@cloudera.com wrote:
Hi Mark,
Sorry to hear that all your EC2 instances were terminated
Hi Jeyendran,
Were there any errors reported in the datanode logs? There could be a
problem with datanodes contacting the namenode, caused by firewall
configuration problems (EC2 security groups).
Cheers,
Tom
On Fri, Sep 4, 2009 at 12:17 AM, Jeyendran
Hi Cam,
Looks like it's in hadoop-hdfs-hdfswithmr-test-0.21.0-dev.jar, which
should be built with ant jar-test.
Cheers,
Tom
On Mon, Aug 24, 2009 at 8:22 PM, Cam Macdonellc...@cs.ualberta.ca wrote:
Thanks Danny,
It currently does not show up hadoop-common-test, hadoop-hdfs-test or
Hi Roman,
Have a look at CombineFileInputFormat - it might be related to what
you are trying to do.
Cheers,
Tom
On Thu, Aug 20, 2009 at 10:59 AM, roman kolcunroman.w...@gmail.com wrote:
On Thu, Aug 20, 2009 at 10:30 AM, Harish Mallipeddi
harish.mallipe...@gmail.com wrote:
On Thu, Aug 20,
On Mon, Aug 3, 2009 at 3:09 AM, Billy
Pearsonbilly_pear...@sbcglobal.net wrote:
not sure if its still there but there was a parm in the hadoop-site conf
file that would allow you to skip x number if index when reading it in to
memory.
This is io.map.index.skip (default 0), which will skip
I've now updated the news section, and the documentation on the
website to reflect the 0.19.2 release.
There were several reports of it being more stable than 0.19.1 in the
voting thread:
http://www.mail-archive.com/common-...@hadoop.apache.org/msg00051.html
Cheers,
Tom
On Tue, Jul 28, 2009 at
That's for the case where you want to do the decompression yourself,
explicitly, perhaps when you are reading the data out of HDFS (and not
using MapReduce). When using compressed data as input to a MapReduce
job, Hadoop will automatically decompress them for you.
Tom
On Fri, Jul 31, 2009 at
Hi Raakhi,
JobControl is designed to be run from a new thread:
Thread t = new Thread(jobControl);
t.start();
Then you can run a loop to poll for job completion and print out status:
String oldStatus = null;
while (!jobControl.allFinished()) {
String status =
Hi Jianmin,
Partitioner extends JobConfigurable, so you can implement the
configure() method to access the JobConf.
Hope that helps.
Cheers,
Tom
On Tue, Jul 14, 2009 at 10:27 AM, Jianmin Woojianmin_...@yahoo.com wrote:
Hi,
I am considering to implement a Partitioner that needs to access the
There's a Jira to fix this here:
https://issues.apache.org/jira/browse/MAPREDUCE-434
Tom
On Mon, Jul 13, 2009 at 12:34 AM, jason hadoopjason.had...@gmail.com wrote:
If the jobtracker is set to local, there is no way to have more than 1
reducer.
On Sun, Jul 12, 2009 at 12:21 PM, Rares Vernica
in 0.20. It seems that
the org.apache.hadoop.mapred.Partitioner is deprecated and will be removed in
the futture.
Do you have some suggestions on this?
Thanks,
Jianmin
From: Tom White t...@cloudera.com
To: common-user@hadoop.apache.org
Sent: Tuesday, July
Hi Akhil,
Have a look at the mapred.jobtracker.restart.recover property.
Cheers,
Tom
On Sun, Jul 12, 2009 at 12:06 AM, akhil1988akhilan...@gmail.com wrote:
HI All,
I am looking for ways to restart my hadoop job from where it left when the
entire cluster goes down or the job gets stopped
Have a look at the datanode log files on the datanode machines and see
what the error is in there.
Cheers,
Tom
On Thu, Jun 25, 2009 at 6:21 AM, .ke. sivakumarkesivaku...@gmail.com wrote:
Hi all, I'm a student and I have been tryin to set up the hadoop cluster for
a while
but have been
Hi Krishna,
You get this error when the jar file cannot be found. It looks like
/user/hadoop/hadoop-0.18.0-examples.jar is an HDFS path, when in fact
it should be a local path.
Cheers,
Tom
On Thu, Jun 25, 2009 at 9:43 AM, krishna prasannasvk_prasa...@yahoo.com wrote:
Oh! thanks Shravan
Hi Usman,
Before the rebalancer was introduced one trick people used was to
increase the replication on all the files in the system, wait for
re-replication to complete, then decrease the replication to the
original level. You can do this using hadoop fs -setrep.
Cheers,
Tom
On Thu, Jun 25,
You can change the value of hadoop.root.logger in
conf/log4j.properties to change the log level globally. See also the
section Custom Logging levels in the same file to set levels on a
per-component basis.
You can also use hadoop daemonlog to set log levels on a temporary
basis (they are reset on
Hi Chris,
You should really start all the slave nodes to be sure that you don't
lose data. If you start fewer than #nodes - #replication + 1 nodes
then you are virtually guaranteed to lose blocks. Starting 6 nodes out
of 10 will cause the filesystem to remain in safe mode, as you've
seen.
BTW
Hi Saptarshi,
The group permissions open the firewall ports to enable access, but
there are no shared keys on the cluster by default. See
https://issues.apache.org/jira/browse/HADOOP-4131 for a patch to the
scripts that shares keys to allow SSH access between machines in the
cluster.
Cheers,
Tom
Hi Kun,
The book's code is for 0.20.0. In Hadoop 0.17.x WritableComparable was
not generic, so you need a declaration like:
public class IntPair implements WritableComparable {
}
And the compareTo() method should look like this:
public int compareTo(Object o) {
IntPair ip = (IntPair) o;
You might be interested in
https://issues.apache.org/jira/browse/HDFS-385, where there is
discussion about how to add pluggable block placement to HDFS.
Cheers,
Tom
On Tue, Jun 23, 2009 at 5:50 PM, Alex Loddengaarda...@cloudera.com wrote:
Hi Hyunsik,
Unfortunately you can't control the
Hi Ninad,
I don't know if anyone has looked at this for Hadoop Core or HBase
(although there is this Jira:
https://issues.apache.org/jira/browse/HADOOP-4604), but there's some
work for making ZooKeeper's jar OSGi compliant at
https://issues.apache.org/jira/browse/ZOOKEEPER-425.
Cheers,
Tom
On
Actually, the space is needed, to be interpreted as a Hadoop option by
ToolRunner. Without the space it sets a Java system property, which
Hadoop will not automatically pick up.
Ian, try putting the options after the classname and see if that
helps. Otherwise, it would be useful to see a snippet
Hi Stuart,
There isn't an InputFormat that comes with Hadoop to do this. Rather
than pre-processing the file, it would be better to implement your own
InputFormat. Subclass FileInputFormat and provide an implementation of
getRecordReader() that returns your implementation of RecordReader to
read
Hi Walter,
On Thu, May 28, 2009 at 6:52 AM, walter steffe ste...@tiscali.it wrote:
Hello
I am a new user and I would like to use hadoop streaming with
SequenceFile in both input and output side.
-The first difficoulty arises from the lack of a simple tool to generate
a SequenceFile
Have you had a look at Nutch (http://lucene.apache.org/nutch/)? It has
solved this kind of problem.
Cheers,
Tom
On Wed, May 27, 2009 at 9:58 AM, John Clarke clarke...@gmail.com wrote:
My current project is to gather stats from a lot of different documents.
We're are not indexing just getting
RandomAccessFile isn't supported directly, but you can seek when
reading from files in HDFS (see FSDataInputStream's seek() method).
Writing at an arbitrary offset in an HDFS file is not supported
however.
Cheers,
Tom
On Sun, May 24, 2009 at 1:33 PM, Stas Oskin stas.os...@gmail.com wrote:
Hi.
You can't use it yet, but
https://issues.apache.org/jira/browse/HADOOP-3799 (Design a pluggable
interface to place replicas of blocks in HDFS) would enable you to
write your own policy so blocks are never placed locally. Might be
worth following its development to check it can meet your need?
On Thu, May 21, 2009 at 5:18 AM, Foss User foss...@gmail.com wrote:
On Wed, May 20, 2009 at 3:18 PM, Tom White t...@cloudera.com wrote:
The number of maps to use is calculated on the client, since splits
are computed on the client, so changing the value of mapred.map.tasks
only
On Wed, May 20, 2009 at 10:22 PM, Stas Oskin stas.os...@gmail.com wrote:
You should only use this if you plan on manually closing FileSystems
yourself from within your own shutdown hook. It's somewhat of an advanced
feature, and I wouldn't recommend using this patch unless you fully
steered me in
the right direction!
Thanks
John
2009/5/20 Tom White t...@cloudera.com
Hi John,
You could do this with a map only-job (using NLineInputFormat, and
setting the number of reducers to 0), and write the output key as
docnameN,stat1,stat2,stat3,stat12 and a null value
The number of maps to use is calculated on the client, since splits
are computed on the client, so changing the value of mapred.map.tasks
only on the jobtracker will not have any effect.
Note that the number of map tasks that you set is only a suggestion,
and depends on the number of splits
Hi John,
You could do this with a map only-job (using NLineInputFormat, and
setting the number of reducers to 0), and write the output key as
docnameN,stat1,stat2,stat3,stat12 and a null value. This assumes
that you calculate all 12 statistics in one map. Each output file
would have a single
On Fri, May 15, 2009 at 11:06 PM, Owen O'Malley omal...@apache.org wrote:
On May 15, 2009, at 2:05 PM, Aaron Kimball wrote:
In either case, there's a dependency there.
You need to split it so that there are no cycles in the dependency tree. In
the short term it looks like:
avro:
core:
Looks like you are trying to copy file to HDFS in a shutdown hook.
Since you can't control the order in which shutdown hooks run, this is
won't work. There is a patch to allow Hadoop's FileSystem shutdown
hook to be disabled so it doesn't close filesystems on exit. See
Hi Chris,
The task-attempt local working folder is actually just the current
working directory of your map or reduce task. You should be able to
pass your legacy command line exe and other files using the -files
option (assuming you are using the Java interface to write your job,
and you are
On Mon, May 18, 2009 at 11:44 AM, Steve Loughran ste...@apache.org wrote:
Grace wrote:
To follow up this question, I have also asked help on Jrockit forum. They
kindly offered some useful and detailed suggestions according to the JRA
results. After updating the option list, the performance
foxed now.
Joydeep
-Original Message-
From: Joydeep Sen Sarma [mailto:jssa...@facebook.com]
Sent: Wednesday, May 13, 2009 9:38 PM
To: core-user@hadoop.apache.org
Cc: Tom White
Subject: RE: public IP for datanode on EC2
Thanks Philip. Very helpful (and great blog post)! This seems
(and resolve to public ip addresses from outside).
The only data transfer that I would incur while submitting jobs from outside
is the cost of copying the jar files and any other files meant for the
distributed cache). That would be extremely small.
-Original Message-
From: Tom White
-
From: Tom White [mailto:t...@cloudera.com]
Sent: Friday, May 08, 2009 1:36 AM
To: core-user@hadoop.apache.org
Subject: Re: HDFS to S3 copy problems
Perhaps we should revisit the implementation of NativeS3FileSystem so
that it doesn't always buffer the file on the client. We could have
Hi Kevin,
The s3n filesystem treats each file as a single block, however you may
be able to split files by setting the number of mappers appropriately
(or setting mapred.max.split.size in the new MapReduce API in 0.20.0).
S3 supports range requests, and the s3n implementation uses them, so
it
On Thu, May 7, 2009 at 6:05 AM, Foss User foss...@gmail.com wrote:
Thanks for your response again. I could not understand a few things in
your reply. So, I want to clarify them. Please find my questions
inline.
On Thu, May 7, 2009 at 2:28 AM, Todd Lipcon t...@cloudera.com wrote:
On Wed, May
Hi David,
The MapReduce framework will attempt to rerun failed tasks
automatically. However, if a task is running out of memory on one
machine, it's likely to run out of memory on another, isn't it? Have a
look at the mapred.child.java.opts configuration property for the
amount of memory that
Hi Ivan,
I haven't tried this combination, but I think it should work. If it
doesn't it should be treated as a bug.
Tom
On Wed, May 6, 2009 at 11:46 AM, Ivan Balashov ibalas...@iponweb.net wrote:
Greetings to all,
Could anyone suggest if Paths from different FileSystems can be used as
input
Hi Rajarshi,
FileInputFormat (SDFInputFormat's superclass) will break files into
splits, typically on HDFS block boundaries (if the defaults are left
unchanged). This is not a problem for your code however, since it will
read every record that starts within a split (even if it crosses a
split
Another way to do this would be to set a property in the Hadoop config itself.
In the job launcher you would have something like:
JobConf conf = ...
conf.setProperty(foo, test);
Then you can read the property in your map or reduce task.
Tom
On Thu, Apr 30, 2009 at 3:25 PM, Aaron Kimball
Have a look at the instructions on
http://wiki.apache.org/hadoop/HowToRelease under the Building
section. It tells you which environment settings and Ant targets you
need to set.
Tom
On Tue, Apr 28, 2009 at 9:09 AM, Sid123 itis...@gmail.com wrote:
HI I have applied a small patch for version
You need to start each JobControl in its own thread so they can run
concurrently. Something like:
Thread t = new Thread(jobControl);
t.start();
Then poll the jobControl.allFinished() method.
Tom
On Tue, Apr 21, 2009 at 10:02 AM, nguyenhuynh.mr
nguyenhuynh...@gmail.com wrote:
Hi all!
Not sure if will affect your findings, but when you read from a
FSDataInputStream you should see how many bytes were actually read by
inspecting the return value and re-read if it was fewer than you want.
See Hadoop's IOUtils readFully() method.
Tom
On Mon, Apr 13, 2009 at 4:22 PM, Brian
Does it work if you use addArchiveToClassPath()?
Also, it may be more convenient to use GenericOptionsParser's -libjars option.
Tom
On Mon, Mar 2, 2009 at 7:42 AM, Aaron Kimball aa...@cloudera.com wrote:
Hi all,
I'm stumped as to how to use the distributed cache's classpath feature. I
have
Hi Josh,
The other aspect to think about when writing your own record reader is
input splits. As Jeff mentioned you really want mappers to be
processing about one HDFS block's worth of data. If your inputs are
significantly smaller, the overhead of creating mappers will be high
and your jobs will
Hi Paul,
Looking at the stack trace, the exception is being thrown from your
map method. Can you put some debugging in there to diagnose it?
Detecting and logging the size of the array and the index you are
trying to access should help. You can write to standard error and look
in the task logs.
I haven't used Eucalyptus, but you could start by trying out the
Hadoop EC2 scripts (http://wiki.apache.org/hadoop/AmazonEC2) with your
Eucalyptus installation.
Cheers,
Tom
On Tue, Mar 3, 2009 at 2:51 PM, falcon164 mujahid...@gmail.com wrote:
I am new to hadoop. I want to run hadoop on
Hi Richa,
Yes there is. Please see http://wiki.apache.org/hadoop/AmazonEC2.
Tom
On Thu, Mar 5, 2009 at 4:13 PM, Richa Khandelwal richa...@gmail.com wrote:
Hi All,
Is there an existing Hadoop AMI for EC2 which had Hadaoop setup on it?
Thanks,
Richa Khandelwal
University Of California,
On any particular tasktracker slot, task JVMs are shared only between
tasks of the same job. When the job is complete the task JVM will go
away. So there is certainly no sharing between jobs.
I believe the static singleton approach outlined by Scott will work
since the map classes are in a single
Do you experience the problem with and without native compression? Set
hadoop.native.lib to false to disable native compression.
Cheers,
Tom
On Tue, Feb 24, 2009 at 9:40 PM, Gordon Mohr goj...@archive.org wrote:
If you're doing a lot of gzip compression/decompression, you *might* be
hitting
1 - 100 of 157 matches
Mail list logo