Hi Chris,
You should really start all the slave nodes to be sure that you don't
lose data. If you start fewer than #nodes - #replication + 1 nodes
then you are virtually guaranteed to lose blocks. Starting 6 nodes out
of 10 will cause the filesystem to remain in safe mode, as you've
seen.
BTW I'm
You can change the value of hadoop.root.logger in
conf/log4j.properties to change the log level globally. See also the
section "Custom Logging levels" in the same file to set levels on a
per-component basis.
You can also use hadoop daemonlog to set log levels on a temporary
basis (they are reset o
Hi Usman,
Before the rebalancer was introduced one trick people used was to
increase the replication on all the files in the system, wait for
re-replication to complete, then decrease the replication to the
original level. You can do this using hadoop fs -setrep.
Cheers,
Tom
On Thu, Jun 25, 2009
Hi Krishna,
You get this error when the jar file cannot be found. It looks like
/user/hadoop/hadoop-0.18.0-examples.jar is an HDFS path, when in fact
it should be a local path.
Cheers,
Tom
On Thu, Jun 25, 2009 at 9:43 AM, krishna prasanna wrote:
> Oh! thanks Shravan
>
> Krishna.
>
>
>
>
Have a look at the datanode log files on the datanode machines and see
what the error is in there.
Cheers,
Tom
On Thu, Jun 25, 2009 at 6:21 AM, .ke. sivakumar wrote:
> Hi all, I'm a student and I have been tryin to set up the hadoop cluster for
> a while
> but have been unsuccessful till now.
>
>
You might be interested in
https://issues.apache.org/jira/browse/HDFS-385, where there is
discussion about how to add pluggable block placement to HDFS.
Cheers,
Tom
On Tue, Jun 23, 2009 at 5:50 PM, Alex Loddengaard wrote:
> Hi Hyunsik,
>
> Unfortunately you can't control the servers that blocks g
Hi Kun,
The book's code is for 0.20.0. In Hadoop 0.17.x WritableComparable was
not generic, so you need a declaration like:
public class IntPair implements WritableComparable {
}
And the compareTo() method should look like this:
public int compareTo(Object o) {
IntPair ip = (IntPair) o;
Hi Saptarshi,
The group permissions open the firewall ports to enable access, but
there are no shared keys on the cluster by default. See
https://issues.apache.org/jira/browse/HADOOP-4131 for a patch to the
scripts that shares keys to allow SSH access between machines in the
cluster.
Cheers,
Tom
Hi Ninad,
I don't know if anyone has looked at this for Hadoop Core or HBase
(although there is this Jira:
https://issues.apache.org/jira/browse/HADOOP-4604), but there's some
work for making ZooKeeper's jar OSGi compliant at
https://issues.apache.org/jira/browse/ZOOKEEPER-425.
Cheers,
Tom
On Th
Actually, the space is needed, to be interpreted as a Hadoop option by
ToolRunner. Without the space it sets a Java system property, which
Hadoop will not automatically pick up.
Ian, try putting the options after the classname and see if that
helps. Otherwise, it would be useful to see a snippet o
Hi Walter,
On Thu, May 28, 2009 at 6:52 AM, walter steffe wrote:
> Hello
> I am a new user and I would like to use hadoop streaming with
> SequenceFile in both input and output side.
>
> -The first difficoulty arises from the lack of a simple tool to generate
> a SequenceFile starting from a set
Hi Stuart,
There isn't an InputFormat that comes with Hadoop to do this. Rather
than pre-processing the file, it would be better to implement your own
InputFormat. Subclass FileInputFormat and provide an implementation of
getRecordReader() that returns your implementation of RecordReader to
read f
Have you had a look at Nutch (http://lucene.apache.org/nutch/)? It has
solved this kind of problem.
Cheers,
Tom
On Wed, May 27, 2009 at 9:58 AM, John Clarke wrote:
> My current project is to gather stats from a lot of different documents.
> We're are not indexing just getting quite specific stat
This feature is not available yet, and is still under active
discussion. (The current version of HDFS will make the previous block
available to readers.) Michael Stack gave a good summary on the HBase
dev list:
http://mail-archives.apache.org/mod_mbox/hadoop-hbase-dev/200905.mbox/%3c7c962aed090523
RandomAccessFile isn't supported directly, but you can seek when
reading from files in HDFS (see FSDataInputStream's seek() method).
Writing at an arbitrary offset in an HDFS file is not supported
however.
Cheers,
Tom
On Sun, May 24, 2009 at 1:33 PM, Stas Oskin wrote:
> Hi.
>
> Any idea if Rando
You can't use it yet, but
https://issues.apache.org/jira/browse/HADOOP-3799 (Design a pluggable
interface to place replicas of blocks in HDFS) would enable you to
write your own policy so blocks are never placed locally. Might be
worth following its development to check it can meet your need?
Chee
Hi Saptarshi,
You can use the guide at http://wiki.apache.org/hadoop/AmazonEC2 to
run Hadoop 0.19 or later on EC2. It includes instructions for building
your own customized AMI.
Cheers,
Tom
On Fri, May 22, 2009 at 7:11 PM, Saptarshi Guha
wrote:
> Hello,
> Is there a tutorial available to build
red me in
> the right direction!
> Thanks
> John
>
> 2009/5/20 Tom White
>
>> Hi John,
>>
>> You could do this with a map only-job (using NLineInputFormat, and
>> setting the number of reducers to 0), and write the output key as
>> docnameN,stat1,stat2,st
On Wed, May 20, 2009 at 10:22 PM, Stas Oskin wrote:
>>
>> You should only use this if you plan on manually closing FileSystems
>> yourself from within your own shutdown hook. It's somewhat of an advanced
>> feature, and I wouldn't recommend using this patch unless you fully
>> understand the ramif
On Thu, May 21, 2009 at 5:18 AM, Foss User wrote:
> On Wed, May 20, 2009 at 3:18 PM, Tom White wrote:
>> The number of maps to use is calculated on the client, since splits
>> are computed on the client, so changing the value of mapred.map.tasks
>> only on the jobtracker wil
Looks like you are trying to copy file to HDFS in a shutdown hook.
Since you can't control the order in which shutdown hooks run, this is
won't work. There is a patch to allow Hadoop's FileSystem shutdown
hook to be disabled so it doesn't close filesystems on exit. See
https://issues.apache.org/jir
On Fri, May 15, 2009 at 11:06 PM, Owen O'Malley wrote:
>
> On May 15, 2009, at 2:05 PM, Aaron Kimball wrote:
>
>> In either case, there's a dependency there.
>
> You need to split it so that there are no cycles in the dependency tree. In
> the short term it looks like:
>
> avro:
> core: avro
> hd
Hi John,
You could do this with a map only-job (using NLineInputFormat, and
setting the number of reducers to 0), and write the output key as
docnameN,stat1,stat2,stat3,stat12 and a null value. This assumes
that you calculate all 12 statistics in one map. Each output file
would have a single l
The number of maps to use is calculated on the client, since splits
are computed on the client, so changing the value of mapred.map.tasks
only on the jobtracker will not have any effect.
Note that the number of map tasks that you set is only a suggestion,
and depends on the number of splits actual
Hi Chris,
The task-attempt local working folder is actually just the current
working directory of your map or reduce task. You should be able to
pass your legacy command line exe and other files using the -files
option (assuming you are using the Java interface to write your job,
and you are imple
On Mon, May 18, 2009 at 11:44 AM, Steve Loughran wrote:
> Grace wrote:
>>
>> To follow up this question, I have also asked help on Jrockit forum. They
>> kindly offered some useful and detailed suggestions according to the JRA
>> results. After updating the option list, the performance did become
within the cluster (and resolve to public ip addresses from outside).
>
> The only data transfer that I would incur while submitting jobs from outside
> is the cost of copying the jar files and any other files meant for the
> distributed cache). That would be extremely small.
>
>
>
rk
> just fine. I looked at the job.xml files of jobs submitted locally and
> remotely and don't see any relevant differences.
>
> Totally foxed now.
>
> Joydeep
>
> -Original Message-
> From: Joydeep Sen Sarma [mailto:jssa...@facebook.com]
> Sent: Wednesd
hese two
> distributed reads vs a distributed read and a local write then local read.
>
> What do you think?
>
> Cheers,
> Ian Nowland
> Amazon.com
>
> -Original Message-
> From: Tom White [mailto:t...@cloudera.com]
> Sent: Friday, May 08, 2009 1:36 AM
> To: co
Hi Kevin,
The s3n filesystem treats each file as a single block, however you may
be able to split files by setting the number of mappers appropriately
(or setting mapred.max.split.size in the new MapReduce API in 0.20.0).
S3 supports range requests, and the s3n implementation uses them, so
it woul
Perhaps we should revisit the implementation of NativeS3FileSystem so
that it doesn't always buffer the file on the client. We could have an
option to make it write directly to S3. Thoughts?
Regarding the problem with HADOOP-3733, you can work around it by
setting fs.s3.awsAccessKeyId and fs.s3.aw
> mapred.reduce.tasks 1
You've only got one reduce task, as Jason correctly surmised. Try
setting it using
-D mapred.reduce.tasks=2
when you run your job, or by calling JobConf#setNumReduceTasks()
Tom
On Fri, May 8, 2009 at 7:46 AM, Foss User wrote:
> On Thu, May 7, 2009 at 9:45 PM, jason
On Thu, May 7, 2009 at 6:05 AM, Foss User wrote:
> Thanks for your response again. I could not understand a few things in
> your reply. So, I want to clarify them. Please find my questions
> inline.
>
> On Thu, May 7, 2009 at 2:28 AM, Todd Lipcon wrote:
>> On Wed, May 6, 2009 at 1:46 PM, Foss Use
Hi Rajarshi,
FileInputFormat (SDFInputFormat's superclass) will break files into
splits, typically on HDFS block boundaries (if the defaults are left
unchanged). This is not a problem for your code however, since it will
read every record that starts within a split (even if it crosses a
split boun
Hi Ivan,
I haven't tried this combination, but I think it should work. If it
doesn't it should be treated as a bug.
Tom
On Wed, May 6, 2009 at 11:46 AM, Ivan Balashov wrote:
> Greetings to all,
>
> Could anyone suggest if Paths from different FileSystems can be used as
> input of Hadoop job?
>
Hi David,
The MapReduce framework will attempt to rerun failed tasks
automatically. However, if a task is running out of memory on one
machine, it's likely to run out of memory on another, isn't it? Have a
look at the mapred.child.java.opts configuration property for the
amount of memory that each
Hi Sasha,
As you say, HDFS appends are not yet working reliably enough to be
suitable for production use. On the other hand, having lots of little
files is bad for the namenode, and inefficient for MapReduce (see
http://www.cloudera.com/blog/2009/02/02/the-small-files-problem/), so
it's best to av
Another way to do this would be to set a property in the Hadoop config itself.
In the job launcher you would have something like:
JobConf conf = ...
conf.setProperty("foo", "test");
Then you can read the property in your map or reduce task.
Tom
On Thu, Apr 30, 2009 at 3:25 PM, Aaron Kimball w
Have a look at the instructions on
http://wiki.apache.org/hadoop/HowToRelease under the "Building"
section. It tells you which environment settings and Ant targets you
need to set.
Tom
On Tue, Apr 28, 2009 at 9:09 AM, Sid123 wrote:
>
> HI I have applied a small patch for version 0.20 to my old 0
, nguyenhuynh.mr
wrote:
> Tom White wrote:
>
>> You need to start each JobControl in its own thread so they can run
>> concurrently. Something like:
>>
>> Thread t = new Thread(jobControl);
>> t.start();
>>
>> Then poll the jobControl.allFinished()
You need to start each JobControl in its own thread so they can run
concurrently. Something like:
Thread t = new Thread(jobControl);
t.start();
Then poll the jobControl.allFinished() method.
Tom
On Tue, Apr 21, 2009 at 10:02 AM, nguyenhuynh.mr
wrote:
> Hi all!
>
>
> I have some jobs: j
Not sure if will affect your findings, but when you read from a
FSDataInputStream you should see how many bytes were actually read by
inspecting the return value and re-read if it was fewer than you want.
See Hadoop's IOUtils readFully() method.
Tom
On Mon, Apr 13, 2009 at 4:22 PM, Brian Bockelma
Does it work if you use addArchiveToClassPath()?
Also, it may be more convenient to use GenericOptionsParser's -libjars option.
Tom
On Mon, Mar 2, 2009 at 7:42 AM, Aaron Kimball wrote:
> Hi all,
>
> I'm stumped as to how to use the distributed cache's classpath feature. I
> have a library of Ja
other format
> that works better with MR. If anyone has any ideas on what file formats
> works best for storing and processing large amounts of time series
> points with MR, I'm all ears. We're moving towards a new philosophy wrt
> big data so it's a good time for us to exami
Hi Josh,
The other aspect to think about when writing your own record reader is
input splits. As Jeff mentioned you really want mappers to be
processing about one HDFS block's worth of data. If your inputs are
significantly smaller, the overhead of creating mappers will be high
and your jobs will
Hi Paul,
Looking at the stack trace, the exception is being thrown from your
map method. Can you put some debugging in there to diagnose it?
Detecting and logging the size of the array and the index you are
trying to access should help. You can write to standard error and look
in the task logs. An
Hi Ken,
Unfortunately, Hadoop doesn't yet support MapReduce on zipped files
(see https://issues.apache.org/jira/browse/HADOOP-1824), so you'll
need to write a program to unzip them and write them into HDFS first.
Cheers,
Tom
On Tue, Mar 10, 2009 at 4:11 AM, jason hadoop wrote:
> Hadoop has supp
Hi Richa,
Yes there is. Please see http://wiki.apache.org/hadoop/AmazonEC2.
Tom
On Thu, Mar 5, 2009 at 4:13 PM, Richa Khandelwal wrote:
> Hi All,
> Is there an existing Hadoop AMI for EC2 which had Hadaoop setup on it?
>
> Thanks,
> Richa Khandelwal
>
>
> University Of California,
> Santa Cruz.
I haven't used Eucalyptus, but you could start by trying out the
Hadoop EC2 scripts (http://wiki.apache.org/hadoop/AmazonEC2) with your
Eucalyptus installation.
Cheers,
Tom
On Tue, Mar 3, 2009 at 2:51 PM, falcon164 wrote:
>
> I am new to hadoop. I want to run hadoop on eucalyptus. Please let me
On any particular tasktracker slot, task JVMs are shared only between
tasks of the same job. When the job is complete the task JVM will go
away. So there is certainly no sharing between jobs.
I believe the static singleton approach outlined by Scott will work
since the map classes are in a single
Do you experience the problem with and without native compression? Set
hadoop.native.lib to false to disable native compression.
Cheers,
Tom
On Tue, Feb 24, 2009 at 9:40 PM, Gordon Mohr wrote:
> If you're doing a lot of gzip compression/decompression, you *might* be
> hitting this 6+-year-old Su
The decommission process is for data nodes - which you are not
running. Have a look at the mapred.hosts.exclude property for how to
exclude tasktrackers.
Tom
On Tue, Feb 17, 2009 at 5:31 PM, S D wrote:
> Thanks for your response. For clarification, I'm using S3 Native instead of
> HDFS. Hence, I
You can retrieve them from the command line using
bin/hadoop job -counter
Tom
On Wed, Feb 11, 2009 at 12:20 AM, scruffy323 wrote:
>
> Do you know how to access those counters programmatically after the job has
> run?
>
>
> S D-5 wrote:
>>
>> This does it. Thanks!
>>
>> On Thu, Feb 5, 2009 at
Hi Mark,
Not all the bytes stored in a BytesWritable object are necessarily
valid. Use BytesWritable#getLength() to determine how much of the
buffer returned by BytesWritable#getBytes() to use.
Tom
On Fri, Feb 6, 2009 at 5:41 AM, Mark Kerzner wrote:
> Hi,
>
> I have written binary files to a Se
gt;> to be
>>>> >> 0 always.
>>>> >>
>>>> >>RunningJob running = JobClient.runJob(conf);
>>>> >>
>>>> >> Counters ct = new Counters();
>>>> >> ct = runni
Hi Sharath,
The code you posted looks right to me. Counters#getCounter() will
return the counter's value. What error are you getting?
Tom
On Thu, Feb 5, 2009 at 10:09 AM, some speed wrote:
> Hi,
>
> Can someone help me with the usage of counters please? I am incrementing a
> counter in Reduce m
NLineInputFormat is ideal for this purpose. Each split will be N lines
of input (where N is configurable), so each mapper can retrieve N
files for insertion into HDFS. You can set the number of redcers to
zero.
Tom
On Tue, Feb 3, 2009 at 4:23 AM, jason hadoop wrote:
> If you have a large number
Hi Brian,
Writes to HDFS are not guaranteed to be flushed until the file is
closed. In practice, as each (64MB) block is finished it is flushed
and will be visible to other readers, which is what you were seeing.
The addition of appends in HDFS changes this and adds a sync() method
to FSDataOutpu
y, can multiple MapReduce workers read the same SequenceFile
> simultaneously?
>
> On Mon, Feb 2, 2009 at 9:42 AM, Tom White wrote:
>
>> Is there any reason why it has to be a single SequenceFile? You could
>> write a local program to write several block compressed Sequenc
The SequenceFile format is described here:
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.html.
The format of the keys and values depends on the serialization classes
used. For example, BytesWritable writes out the length of its byte
array followed by the actual by
Is there any reason why it has to be a single SequenceFile? You could
write a local program to write several block compressed SequenceFiles
in parallel (to HDFS), each containing a portion of the files on your
PC.
Tom
On Mon, Feb 2, 2009 at 3:24 PM, Mark Kerzner wrote:
> Truly, I do not see any
You can use the get() method to seek and retrieve the value. It will
return null if the key is not in the map. Something like:
Text value = (Text) indexReader.get(from, new Text());
while (value != null && ...)
Tom
On Thu, Jan 29, 2009 at 10:45 PM, schnitzi
wrote:
>
> Greetings all... I have a
Each datanode has a web page at
http://datanode:50075/blockScannerReport where you can see details
about the scans.
Tom
On Thu, Jan 29, 2009 at 7:29 AM, Raghu Angadi wrote:
> Owen O'Malley wrote:
>>
>> On Jan 28, 2009, at 6:16 PM, Sriram Rao wrote:
>>
>>> By "scrub" I mean, have a tool that read
It would be nice to make this more uniform. There's an outstanding
Jira on this if anyone is interested in looking at it:
https://issues.apache.org/jira/browse/HADOOP-2914
Tom
On Fri, Jan 23, 2009 at 12:14 AM, Aaron Kimball wrote:
> Hi Bhupesh,
>
> I've noticed the same problem -- LocalJobRunner
; I suppose this would accomplish the same thing?
>
>
>
> -Original Message-
> From: Tom White [mailto:t...@cloudera.com]
> Sent: Thursday, January 22, 2009 10:41 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Set the Order of the Keys in Reduce
>
> Hi Brian,
>
> The
Hi Mark,
The archives are listed on http://wiki.apache.org/hadoop/MailingListArchives
Tom
On Thu, Jan 22, 2009 at 3:41 PM, Mark Kerzner wrote:
> Hi,
> is there an archive to the messages? I am a newcomer, granted, but google
> groups has all the discussion capabilities, and it has a searchable
Hi Brian,
The CAT_A and CAT_B keys will be processed by different reducer
instances, so they run independently and may run in any order. What's
the output that you're trying to get?
Cheers,
Tom
On Thu, Jan 22, 2009 at 3:25 PM, Brian MacKay
wrote:
> Hello,
>
>
>
> Any tips would be greatly appre
Hi Matthias,
It is not necessary to have SSH set up to run Hadoop, but it does make
things easier. SSH is used by the scripts in the bin directory which
start and stop daemons across the cluster (the slave nodes are defined
in the slaves file), see the start-all.sh script as a starting point.
Thes
Thanks flip.
I've signed up for the hadoop account - be great to get some help with
getting it going.
Tom
On Wed, Jan 14, 2009 at 6:33 AM, Philip (flip) Kromer
wrote:
> Hey all,
> There is no @hadoop on twitter, but there should be.
> http://twitter.com/datamapper and http://twitter.com/rails b
LZO was removed due to license incompatibility:
https://issues.apache.org/jira/browse/HADOOP-4874
Tom
On Wed, Jan 14, 2009 at 11:18 AM, Gert Pfeifer
wrote:
> I got it. For some reason getDefaultExtension() returns ".lzo_deflate".
>
> Is that a bug? Shouldn't it be .lzo?
>
> In the head revision
I've opened https://issues.apache.org/jira/browse/HADOOP-5014 for this.
Do you get this behaviour when you use the native libraries?
Tom
On Sat, Jan 10, 2009 at 12:26 AM, Oscar Gothberg
wrote:
> Hi,
>
> I'm having trouble with Hadoop (tested with 0.17 and 0.19) not fully
> processing certain g
Hi Richard,
Are you running out of memory after many PDFs have been processed by
one mapper, or during the first? The former would suggest that memory
isn't being released; the latter that the task VM doesn't have enough
memory to start with.
Are you setting the memory available to map tasks by s
Hi Jim,
Try something like:
Counters counters = job.getCounters();
counters.findCounter("org.apache.hadoop.mapred.Task$Counter",
"REDUCE_INPUT_RECORDS").getCounter()
The pre-defined counters are unfortunately not public and are not in
one place in the source code, so you'll need to hunt to find
Hi Ryan,
The ec2-describe-instances command in the API tool reports the launch
time for each instance, so you could work out the machine hours of
your cluster using that information.
Tom
On Thu, Dec 18, 2008 at 4:59 PM, Ryan LeCompte wrote:
> Hello all,
>
> Somewhat of a an off-topic related qu
Hi Stefan,
The USER_DATA line is a hangover from the way that these parameters
used to be passed to the node. This line can safely be removed, since
the scripts now pass the data in the USER_DATA_FILE as you rightly
point out.
Tom
On Thu, Dec 18, 2008 at 10:09 AM, Stefan Groschupf wrote:
> Hi,
I've opened https://issues.apache.org/jira/browse/HADOOP-4881 and
attached a patch to fix this.
Tom
On Fri, Dec 12, 2008 at 2:18 AM, Tarandeep Singh wrote:
> The example is just to illustrate how one should implement one's own
> WritableComparable class and in the compreTo method, it is just sho
You can also see the logs from the web UI (http://:50030
by default), by clicking through to the map or reduce task that you
are interested in and looking at the page for task attempts.
Tom
On Wed, Dec 10, 2008 at 10:41 PM, Tarandeep Singh <[EMAIL PROTECTED]> wrote:
> you can see the output in ha
There's a writeXml() method (or just write() in earlier releases) on
Configuration which should do what you need. Also see Configuration's
main() method.
Tom
On Wed, Dec 3, 2008 at 8:39 AM, Johannens Zillmann
<[EMAIL PROTECTED]> wrote:
> Hi everybody,
>
> does anybody know if there exists a tool
I've just created a basic script to do something similar for running a
benchmark on EC2. See
https://issues.apache.org/jira/browse/HADOOP-4382. As it stands the
code for detecting when Hadoop is ready to accept jobs is simplistic,
to say the least, so any ideas for improvement would be great.
Than
>From the Google Blog,
http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html
"We are excited to announce we were able to sort 1TB (stored on the
Google File System as 10 billion 100-byte records in uncompressed text
files) on 1,000 computers in 68 seconds. By comparison, the previ
;:
> waiting for it!!!
>
> 2008/9/5, Owen O'Malley <[EMAIL PROTECTED]>:
>>
>>
>> On Sep 4, 2008, at 6:36 AM, 叶双明 wrote:
>>
>> what book?
>>>
>>
>> To summarize, Tom White is writing a book about Hadoop. He will post a
>> message to the list when a draft is ready.
>>
>> -- Owen
>
If you make your Serialization implement Configurable it will be given
a Configuration object that it can pass to the Deserializer on
construction.
Also, this thread may be related:
http://www.nabble.com/Serialization-with-additional-schema-info-td19260579.html
Tom
On Sat, Sep 13, 2008 at 12:38
On Thu, Sep 4, 2008 at 1:46 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote:
> I'm noticing that using bin/hadoop fs -put ... svn://... is uploading
> multi-gigabyte files in ~64MB chunks.
That's because S3Filesystem stores files as 64MB blocks on S3.
> Then, when this is copied from
> S3 into HDFS u
I've just created public AMIs for 0.18.0. Note that they are in the
hadoop-images bucket.
Tom
On Fri, Aug 29, 2008 at 9:22 PM, Karl Anderson <[EMAIL PROTECTED]> wrote:
>
> On 29-Aug-08, at 6:49 AM, Stuart Sierra wrote:
>
>> Anybody have one? Any success building it with create-hadoop-image?
>> T
ut
it looks like a natural fit.
>
> Thanks!
>
> Ryan
>
>
> On Wed, Sep 3, 2008 at 9:54 AM, Tom White <[EMAIL PROTECTED]> wrote:
>> There's a case study with some numbers in it from a presentation I
>> gave on Hadoop and AWS in London last month, whic
Lukáš, Feris, I'll be sure to post a message to the list when the
book's available as a Rough Cut.
Tom
2008/8/28 Feris Thia <[EMAIL PROTECTED]>:
> Agree...
>
> I will be glad to be early notified about the release :)
>
> Regards,
>
> Feris
>
> 2008/8/29 Lukáš Vlček <[EMAIL PROTECTED]>
>
>> Tom,
>
There's a case study with some numbers in it from a presentation I
gave on Hadoop and AWS in London last month, which you may find
interesting: http://skillsmatter.com/custom/presentations/ec2-talk.pdf.
tim robertson <[EMAIL PROTECTED]> wrote:
> For these small
> datasets, you might find it useful
For the s3:// filesystem, files are split into 64MB blocks which are
sent to S3 individually. Rather than increase the jets3t.properties
retry buffer and retry count, it is better to change the Hadoop
properties fs.s3.maxRetries and fs.s3.sleepTimeSeconds, since the
Hadoop-level retry mechanism ret
Hi Juho,
I think you should be able to use the Thrift serialization stuff that
I've been working on in
https://issues.apache.org/jira/browse/HADOOP-3787 - at least as a
basis. Since you are not using sequence files, you will need to write
an InputFormat (probably one that extends FileInputFormat)
tter.com/custom/presentations/ec2-talk.pdf)
> that Tom White is working on Hadoop book now.
>
> Lukas
>
> 2008/8/26 Feris Thia <[EMAIL PROTECTED]>
>
>> Hi Lukas,
>>
>> I've check on Youtube.. and yes, there are many explanations on Hadoop.
>>
>
On Thu, Jul 17, 2008 at 6:16 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Can't one work around this by using a different configuration on the client
> than on the namenodes and datanodes? The client should be able to set
> fs.default.name to an s3: uri, while the namenode and datanode must have
On Fri, Jul 11, 2008 at 9:09 PM, slitz <[EMAIL PROTECTED]> wrote:
> a) Use S3 only, without HDFS and configuring fs.default.name as s3://bucket
> -> PROBLEM: we are getting ERROR org.apache.hadoop.dfs.NameNode:
> java.lang.RuntimeException: Not a host:port pair: X
What command are you using t
On Thu, Jul 10, 2008 at 10:06 PM, Lincoln Ritter
<[EMAIL PROTECTED]> wrote:
> Thank you, Tom.
>
> Forgive me for being dense, but I don't understand your reply:
>
Sorry! I'll try to explain it better (see below).
>
> Do you mean that it is possible to use the Hadoop daemons with S3 but
> the defa
> I get (where the all-caps portions are the actual values...):
>
> 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode:
> java.lang.NumberFormatException: For input string:
> "[EMAIL PROTECTED]"
>at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>
Hi Tim,
The steps you outline look about right. Because your file is >5GB you
will need to use the S3 block file system, which has a s3 URL. (See
http://wiki.apache.org/hadoop/AmazonS3) You shouldn't have to build
your own AMI unless you have dependencies that can't be submitted as a
part of the M
The task subdirectories are being deleted, but the job directory and
its work subdirectory are not. This is causing a problem since disk
space is filling up over time, and restarting the cluster after a long
time is very slow as the tasktrackers clear out the jobcache
directories.
This doesn't hap
I've successfully run Hadoop on Solaris 5.10 (on Intel). The path
included /usr/ucb so whoami was picked up correctly.
Satoshi, you say you added /usr/ucb to you path too, so I'm puzzled
why you get a LoginException saying "whoami: not found" - did you
export your changes to path?
I've also manag
Hi Einar,
How did you put the data onto S3, using Hadoop's S3 FileSystem or
using other S3 tools? If it's the latter then it won't work as the s3
scheme is for Hadoop's block-based S3 storage. Native S3 support is
coming - see https://issues.apache.org/jira/browse/HADOOP-930, but
it's not integrat
Hi Jeff,
I've built two public 0.17.0 AMIs (32-bit and 64-bit), so you should
be able to use the 0.17 scripts to launch them now.
Cheers,
Tom
On Thu, May 22, 2008 at 6:37 AM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> Hi Jeff,
>
> 0.17.0 was released yesterday, from what I can tell.
>
>
> Oti
Hi Jeff,
There is no public 0.17 AMI yet - we need 0.17 to be released first.
So in the meantime you'll have to build your own.
Tom
On Wed, May 14, 2008 at 8:36 PM, Jeff Eastman
<[EMAIL PROTECTED]> wrote:
> I'm trying to bring up a cluster on EC2 using
> (http://wiki.apache.org/hadoop/AmazonEC2)
1 - 100 of 114 matches
Mail list logo