Hello all,
Is anyone using Hadoop as more of a near/almost real-time processing
of log data for their systems to aggregate stats, etc? I know that
Hadoop has generally been good at off-line processing of large amounts
of data, but I've wondered if anyone has tried using it for processing
of near r
Are there any plans to build redundancy/failover support for the Job
Tracker and Name Node components in Hadoop? Let's take the current
scenario:
1) A data/cpu intensive job is submitted to a Hadoop cluster of 10 machines.
2) Half-way through the job execution, the Job Tracker or Name Node fails.
hu, Dec 18, 2008 at 4:59 PM, Ryan LeCompte wrote:
>> Hello all,
>>
>> Somewhat of a an off-topic related question, but I know there are
>> Hadoop + EC2 users here. Does anyone know if there is a programmatic
>> API to get find out how many machine time hours have been
Hello all,
Somewhat of a an off-topic related question, but I know there are
Hadoop + EC2 users here. Does anyone know if there is a programmatic
API to get find out how many machine time hours have been used by a
Hadoop cluster (or anything) running on EC2? I know that you can log
into the EC2 we
Even better! I'll try this out tomorrow.
Thanks,
Ryan
On Dec 9, 2008, at 10:36 PM, "Aaron Kimball" <[EMAIL PROTECTED]> wrote:
Note also that "cat foo | bin/hadoop fs -put - some/hdfs/path" will
use
stdin.
- Aaron
On Mon, Dec 8, 2008 at 5:56 PM, Ryan LeCom
Just what I need -- thanks!
On Mon, Dec 8, 2008 at 7:31 PM, Alex Loddengaard <[EMAIL PROTECTED]> wrote:
> This should answer your questions:
>
> <http://wiki.apache.org/hadoop/MountableHDFS>
>
> Alex
>
> On Mon, Dec 8, 2008 at 2:19 PM, Ryan LeCompte <[EMA
Hello all,
I normally upload files into hadoop via bin/hadoop fs -put file dest.
However, is there a way to somehow stream data into Hadoop?
For example, I'd love to do something like this:
zcat xxx >> HADOOP_HDFS_DESTINATION
This would save me a ton of time since I don't have to first unpack
For what it's worth, I started seeing these when I upgraded to 0.19. I
was using 10 reduces, but changed it to 30 reduces for my job and now
I don't see these errors any more.
Thanks,
Ryan
On Fri, Dec 5, 2008 at 2:44 PM, Sriram Rao <[EMAIL PROTECTED]> wrote:
> Hi,
>
> When a task tracker kills a
I've tried running the bin/hadoop balance command since I recently
added a new node to the Hadoop cluster. I noticed the following output
in the beginning:
08/12/03 10:26:35 INFO balancer.Balancer: Will move 10 GBbytes in this iteration
Dec 3, 2008 10:26:35 AM 0 0 KB
2.67
I believe I spoke a little too soon. Looks like Hadoop supports .gz
files, not .tgz. :-)
On Mon, Dec 1, 2008 at 10:46 AM, Ryan LeCompte <[EMAIL PROTECTED]> wrote:
> Hello all,
>
> I'm using Hadoop 0.19 and just discovered that it has no problems
> processing .tgz files tha
Hello all,
I'm using Hadoop 0.19 and just discovered that it has no problems
processing .tgz files that contain text files. I was under the
impression that it wouldn't be able to break a .tgz file up into
multiple maps, but instead just treat it as 1 map per .tgz file. Was
this a recent change or
rote:
> you can't guarantee that a reducer (or mapper for that matter) will be
> executed exactly once unless you turn-off preemptive scheduling. but,
> a distinct key gets sent to a single reducer, so yes, only one reducer
> will see a particulat key + associated values
>
>
Hello,
Is it safe to assume that only one reduce task will ever operate on
values for a particular key? Or is it possible that more than one
reduce task can work on values for the same key? The reason I ask is
because I want to ensure that a piece of code that I write at the end
of my reducer meth
t;
> - Original Message
> From: Ryan LeCompte <[EMAIL PROTECTED]>
> To: "core-user@hadoop.apache.org"
> Sent: Monday, September 22, 2008 5:18:01 PM
> Subject: Re: NotYetReplicated exceptions when pushing large files into HDFS
>
> I've noticed th
at 11:08 AM, Ryan LeCompte <[EMAIL PROTECTED]> wrote:
> Hello all,
>
> I'd love to be able to upload into HDFS very large files (e.g., 8 or
> 10GB), but it seems like my only option is to chop up the file into
> smaller pieces. Otherwise, after a while I get NotYetRep
Hello all,
I'd love to be able to upload into HDFS very large files (e.g., 8 or
10GB), but it seems like my only option is to chop up the file into
smaller pieces. Otherwise, after a while I get NotYetReplication
exceptions while the transfer is in progress. I'm using 0.18.1. Is
there any way I ca
I actually solved the problem by increasing a parameter in
hadoop-site.xml, since the default wasn't sufficient:
mapred.child.java.opts
-Xmx1024m
Thanks,
Ryan
On Sun, Sep 21, 2008 at 12:59 AM, Ryan LeCompte <[EMAIL PROTECTED]> wrote:
> Yes I did, but that didn't sol
ntually I had to upgrade to a system with more memory.
-SM
On Sat, Sep 20, 2008 at 9:07 PM, Ryan LeCompte <[EMAIL PROTECTED]>
wrote:
Hello all,
I'm setting up a small 3 node hadoop cluster (1 node for
namenode/jobtracker and the other two for datanode/tasktracker). The
map tasks
Hello all,
I'm setting up a small 3 node hadoop cluster (1 node for
namenode/jobtracker and the other two for datanode/tasktracker). The
map tasks finish fine, but the reduce tasks are failing at about 30%
with an out of memory error. My guess is because the amount of data
that I'm crunching throu
nchmarks about the comparison between MySQL
> queries/updates and HBase queries/updates?
>
> Have a nice day,
>
> Camilo.
>
> On Fri, Sep 12, 2008 at 1:55 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote:
>
>> Hadoop is best suited for distributed processing across ma
Hadoop is best suited for distributed processing across many machines
of large data sets. Most people use Hadoop to plow through large data
sets in an offline fashion. One approach that you can use is to use
Hadoop to process your data, then put it in an optimized form in HBase
(i.e., similar to Go
somehow doesn't share the same classpath as the
program that actually submits the job conf. Is this expected?
Thanks,
Ryan
On Tue, Sep 9, 2008 at 9:44 AM, Ryan LeCompte <[EMAIL PROTECTED]> wrote:
> Okay, I think I'm getting closer but now I'm running into another problem.
>
.SequenceFile$Reader.next(SequenceFile.java:1879)
...
Any ideas?
Thanks,
Ryan
On Tue, Sep 9, 2008 at 12:36 AM, Ryan LeCompte <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I'm attempting to use a SortedMapWritable with a LongWritable as the
> key and a custom implementation of org.ap
Hello,
I'm attempting to use a SortedMapWritable with a LongWritable as the
key and a custom implementation of org.apache.hadoop.io.Writable as
the value. I notice that my program works fine when I use another
primitive wrapper (e.g. Text) as the value, but fails with the
following exception when
This clears up my concerns. Thanks!
Ryan
On Sep 6, 2008, at 2:17 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote:
On Sep 6, 2008, at 9:35 AM, Ryan LeCompte wrote:
I have a question regarding multiple output files that get produced
as
a result of using multiple reduce tasks fo
Hello,
I have a question regarding multiple output files that get produced as
a result of using multiple reduce tasks for a job (as opposed to only
one). If I'm using a custom writable and thus writing to a sequence
output, am I gauranteed that all of the day for a particular key will
appear in a
Hi Sayali,
Yes, you can submit a collection of files from HDFS as input to the
job. Please take a look at the WordCount example in the Map/Reduce
tutorial for an example:
http://hadoop.apache.org/core/docs/r0.18.0/mapred_tutorial.html#Example%3A+WordCount+v1.0
Ryan
On Sat, Sep 6, 2008 at 9:03
ore
micro-management but I think Tom White wrote about it and there was
a link
to it in another discussion you were part of.
Hope this helps,
J-D
On Fri, Sep 5, 2008 at 7:00 PM, Ryan LeCompte <[EMAIL PROTECTED]>
wrote:
Hello,
I was wondering if anyone has gotten far at all with getti
Hello,
I was wondering if anyone has gotten far at all with getting Hadoop up
and running with EC2 + EBS? Any luck getting this to work in a way
that the HDFS runs on the EBS so that it isn't blown away every time
you bring up/down the EC2 Hadoop cluster? I'd like to experiment with
this next, and
Thanks!!
On Sep 5, 2008, at 1:29 PM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote:
On Fri, Sep 5, 2008 at 10:18 AM, Ryan LeCompte <[EMAIL PROTECTED]>
wrote:
Thanks! Quick question on that particular class: why are the methods
synchronized? I didn't think tha
Thanks! Quick question on that particular class: why are the methods
synchronized? I didn't think that key/value objects needed to be
thread safe?
Ryan
On Sep 5, 2008, at 1:09 PM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote:
Yes, it is pretty easy to compose Writables. Just have the write
Hello,
Can a custom Writeable object used as a key/value contain other
writeables, like MapWriteable?
Thanks,
Ryan
Works great!
My only suggestion would be to modify the
/usr/local/hadoop-0.18.0/conf/hadoop-site.xml file to use "hdfs://..."
for the namenode address. Otherwise I constantly get warnings saying
that the syntax is deprecated any time I submit a job for execution or
interact with HDFS via bin/hadoo
Hi Tom,
This clears up my questions.
Thanks!
Ryan
On Thu, Sep 4, 2008 at 9:21 AM, Tom White <[EMAIL PROTECTED]> wrote:
> On Thu, Sep 4, 2008 at 1:46 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote:
>> I'm noticing that using bin/hadoop fs -put ... svn://... is uploadi
m White <[EMAIL PROTECTED]> wrote:
> On Wed, Sep 3, 2008 at 3:05 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote:
>> Tom,
>>
>> I noticed that you mentioned using Amazon's new elastic block store as
>> an alternative to using S3. Right now I'm testing pushing
ple.
>
> This sounds very useful. Please consider creating a Jira and
> submitting the code (even if it's not "finished" folks might like to
> see it). Thanks.
>
> Tom
>
>>
>> Cheers
>>
>> Tim
>>
>>
>>
>> On Tue, Se
g? I guess I'd have to concatenate the files
into 1 file and somehow turn off splitting?
Ryan
On Wed, Sep 3, 2008 at 12:09 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote:
>
> On Sep 2, 2008, at 9:00 PM, Ryan LeCompte wrote:
>
>> Beginner's question:
>>
>>
Beginner's question:
If I have a cluster with a single node that has a max of 1 map/1
reduce, and the job submitted has 50 maps... Then it will process only
1 map at a time. Does that mean that it's spawning 1 new JVM for each
map processed? Or re-using the same JVM when a new map can be
processed
How can you ensure that the S3 buckets and EC2 instances belong to a
certain zone?
Ryan
On Tue, Sep 2, 2008 at 2:38 PM, Karl Anderson <[EMAIL PROTECTED]> wrote:
>
> On 2-Sep-08, at 5:22 AM, Ryan LeCompte wrote:
>
>> Hi Tim,
>>
>> Are you mostly just processi
Actually not if you're using the s3:// as opposed to s3n:// ...
Thanks,
Ryan
On Tue, Sep 2, 2008 at 11:21 AM, James Moore <[EMAIL PROTECTED]> wrote:
> On Mon, Sep 1, 2008 at 1:32 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote:
>> Hello,
>>
>> I'm tryin
p+Sort+Combine about 130,000 jobs a
> seconds (simplest of simple map operations). For these small
> datasets, you might find it useful - let me know if I should spend
> time finishing it (Or submit help?) - it is really very simple.
>
> Cheers
>
> Tim
>
>
>
> On
Tue, Sep 2, 2008 at 8:44 AM, Andrew Hitchcock <[EMAIL PROTECTED]> wrote:
>> Hi Ryan,
>>
>> Just a heads up, if you require more than the 20 node limit, Amazon
>> provides a form to request a higher limit:
>>
>> http://www.amazon.com/gp/html-forms-controller/ec2-req
Hello all,
I'm curious to see how many people are using EC2 to execute their
Hadoop cluster and map/reduce programs, and how many are using
home-grown datacenters. It seems like the 20 node limit with EC2 is a
bit crippling when one wants to process many gigabytes of data. Has
anyone found this to
ir you can find it at the jets3t web site. Make sure that
> it's from the same version that your copy of Hadoop is using.
>
> On Mon, Sep 1, 2008 at 1:32 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote:
>
>> Hello,
>>
>> I'm trying to upload a fairly large
Hello,
I'm trying to upload a fairly large file (18GB or so) to my AWS S3
account via bin/hadoop fs -put ... s3://...
It copies for a good 15 or 20 minutes, and then eventually errors out
with a failed retry attempt (saying that it can't retry since it has
already written a certain number of byte
Nevermind, I figured it out! :) Sorry for spamming the list! For those
interested, I had a stupid host/IP resolution problem which was easily
fixed in /etc/hosts. :)
Thanks,
Ryan
On Sat, Aug 30, 2008 at 3:41 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote:
> I see this in the syslog fo
print() statements in the custom writable's readFields()/write()
methods and it's showing up in the stdout logs. Any ideas?
Thanks,
Ryan
On Sat, Aug 30, 2008 at 10:32 AM, Ryan LeCompte <[EMAIL PROTECTED]> wrote:
> The job finally came back with output. Notice that I don't get these
sk Id :
attempt_200808300858_0003_m_01_0, Status : FAILED
Too many fetch-failures
08/08/30 09:28:04 WARN mapred.JobClient: Error reading task
outputConnection timed out
Any ideas?
On Sat, Aug 30, 2008 at 10:10 AM, Ryan LeCompte <[EMAIL PROTECTED]> wrote:
> Hello all,
>
> I'm new to Hado
Hello all,
I'm new to Hadoop. I'm trying to write a small hadoop map/reduce
program that instead of reading/writing the primitive
LongWritable,IntWritable, etc. classes I'm using a custom object that
I wrote that implements the Writable interface. I'm still using a
LongWritable for the keys, but u
49 matches
Mail list logo