Using Hadoop for near real-time processing of log data

2009-02-25 Thread Ryan LeCompte
Hello all, Is anyone using Hadoop as more of a near/almost real-time processing of log data for their systems to aggregate stats, etc? I know that Hadoop has generally been good at off-line processing of large amounts of data, but I've wondered if anyone has tried using it for processing of near r

Job Tracker/Name Node redundancy

2009-01-09 Thread Ryan LeCompte
Are there any plans to build redundancy/failover support for the Job Tracker and Name Node components in Hadoop? Let's take the current scenario: 1) A data/cpu intensive job is submitted to a Hadoop cluster of 10 machines. 2) Half-way through the job execution, the Job Tracker or Name Node fails.

Re: EC2 Usage?

2008-12-18 Thread Ryan LeCompte
hu, Dec 18, 2008 at 4:59 PM, Ryan LeCompte wrote: >> Hello all, >> >> Somewhat of a an off-topic related question, but I know there are >> Hadoop + EC2 users here. Does anyone know if there is a programmatic >> API to get find out how many machine time hours have been

EC2 Usage?

2008-12-18 Thread Ryan LeCompte
Hello all, Somewhat of a an off-topic related question, but I know there are Hadoop + EC2 users here. Does anyone know if there is a programmatic API to get find out how many machine time hours have been used by a Hadoop cluster (or anything) running on EC2? I know that you can log into the EC2 we

Re: Streaming data into Hadoop

2008-12-09 Thread Ryan LeCompte
Even better! I'll try this out tomorrow. Thanks, Ryan On Dec 9, 2008, at 10:36 PM, "Aaron Kimball" <[EMAIL PROTECTED]> wrote: Note also that "cat foo | bin/hadoop fs -put - some/hdfs/path" will use stdin. - Aaron On Mon, Dec 8, 2008 at 5:56 PM, Ryan LeCom

Re: Streaming data into Hadoop

2008-12-08 Thread Ryan LeCompte
Just what I need -- thanks! On Mon, Dec 8, 2008 at 7:31 PM, Alex Loddengaard <[EMAIL PROTECTED]> wrote: > This should answer your questions: > > <http://wiki.apache.org/hadoop/MountableHDFS> > > Alex > > On Mon, Dec 8, 2008 at 2:19 PM, Ryan LeCompte <[EMA

Streaming data into Hadoop

2008-12-08 Thread Ryan LeCompte
Hello all, I normally upload files into hadoop via bin/hadoop fs -put file dest. However, is there a way to somehow stream data into Hadoop? For example, I'd love to do something like this: zcat xxx >> HADOOP_HDFS_DESTINATION This would save me a ton of time since I don't have to first unpack

Re: stack trace from hung task

2008-12-05 Thread Ryan LeCompte
For what it's worth, I started seeing these when I upgraded to 0.19. I was using 10 reduces, but changed it to 30 reduces for my job and now I don't see these errors any more. Thanks, Ryan On Fri, Dec 5, 2008 at 2:44 PM, Sriram Rao <[EMAIL PROTECTED]> wrote: > Hi, > > When a task tracker kills a

Hadoop balancer

2008-12-03 Thread Ryan LeCompte
I've tried running the bin/hadoop balance command since I recently added a new node to the Hadoop cluster. I noticed the following output in the beginning: 08/12/03 10:26:35 INFO balancer.Balancer: Will move 10 GBbytes in this iteration Dec 3, 2008 10:26:35 AM 0 0 KB 2.67

Re: Hadoop and .tgz files

2008-12-01 Thread Ryan LeCompte
I believe I spoke a little too soon. Looks like Hadoop supports .gz files, not .tgz. :-) On Mon, Dec 1, 2008 at 10:46 AM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: > Hello all, > > I'm using Hadoop 0.19 and just discovered that it has no problems > processing .tgz files tha

Hadoop and .tgz files

2008-12-01 Thread Ryan LeCompte
Hello all, I'm using Hadoop 0.19 and just discovered that it has no problems processing .tgz files that contain text files. I was under the impression that it wouldn't be able to break a .tgz file up into multiple maps, but instead just treat it as 1 map per .tgz file. Was this a recent change or

Re: Question regarding reduce tasks

2008-11-03 Thread Ryan LeCompte
rote: > you can't guarantee that a reducer (or mapper for that matter) will be > executed exactly once unless you turn-off preemptive scheduling. but, > a distinct key gets sent to a single reducer, so yes, only one reducer > will see a particulat key + associated values > >

Question regarding reduce tasks

2008-11-03 Thread Ryan LeCompte
Hello, Is it safe to assume that only one reduce task will ever operate on values for a particular key? Or is it possible that more than one reduce task can work on values for the same key? The reason I ask is because I want to ensure that a piece of code that I write at the end of my reducer meth

Re: NotYetReplicated exceptions when pushing large files into HDFS

2008-09-23 Thread Ryan LeCompte
t; > - Original Message > From: Ryan LeCompte <[EMAIL PROTECTED]> > To: "core-user@hadoop.apache.org" > Sent: Monday, September 22, 2008 5:18:01 PM > Subject: Re: NotYetReplicated exceptions when pushing large files into HDFS > > I've noticed th

Re: NotYetReplicated exceptions when pushing large files into HDFS

2008-09-22 Thread Ryan LeCompte
at 11:08 AM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: > Hello all, > > I'd love to be able to upload into HDFS very large files (e.g., 8 or > 10GB), but it seems like my only option is to chop up the file into > smaller pieces. Otherwise, after a while I get NotYetRep

NotYetReplicated exceptions when pushing large files into HDFS

2008-09-22 Thread Ryan LeCompte
Hello all, I'd love to be able to upload into HDFS very large files (e.g., 8 or 10GB), but it seems like my only option is to chop up the file into smaller pieces. Otherwise, after a while I get NotYetReplication exceptions while the transfer is in progress. I'm using 0.18.1. Is there any way I ca

Re: Reduce tasks running out of memory on small hadoop cluster

2008-09-21 Thread Ryan LeCompte
I actually solved the problem by increasing a parameter in hadoop-site.xml, since the default wasn't sufficient: mapred.child.java.opts -Xmx1024m Thanks, Ryan On Sun, Sep 21, 2008 at 12:59 AM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: > Yes I did, but that didn't sol

Re: Reduce tasks running out of memory on small hadoop cluster

2008-09-20 Thread Ryan LeCompte
ntually I had to upgrade to a system with more memory. -SM On Sat, Sep 20, 2008 at 9:07 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: Hello all, I'm setting up a small 3 node hadoop cluster (1 node for namenode/jobtracker and the other two for datanode/tasktracker). The map tasks

Reduce tasks running out of memory on small hadoop cluster

2008-09-20 Thread Ryan LeCompte
Hello all, I'm setting up a small 3 node hadoop cluster (1 node for namenode/jobtracker and the other two for datanode/tasktracker). The map tasks finish fine, but the reduce tasks are failing at about 30% with an out of memory error. My guess is because the amount of data that I'm crunching throu

Re: Why can't Hadoop be used for online applications ?

2008-09-12 Thread Ryan LeCompte
nchmarks about the comparison between MySQL > queries/updates and HBase queries/updates? > > Have a nice day, > > Camilo. > > On Fri, Sep 12, 2008 at 1:55 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: > >> Hadoop is best suited for distributed processing across ma

Re: Why can't Hadoop be used for online applications ?

2008-09-12 Thread Ryan LeCompte
Hadoop is best suited for distributed processing across many machines of large data sets. Most people use Hadoop to plow through large data sets in an offline fashion. One approach that you can use is to use Hadoop to process your data, then put it in an optimized form in HBase (i.e., similar to Go

Re: Issue in reduce phase with SortedMapWritable and custom Writables as values

2008-09-09 Thread Ryan LeCompte
somehow doesn't share the same classpath as the program that actually submits the job conf. Is this expected? Thanks, Ryan On Tue, Sep 9, 2008 at 9:44 AM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: > Okay, I think I'm getting closer but now I'm running into another problem. >

Re: Issue in reduce phase with SortedMapWritable and custom Writables as values

2008-09-09 Thread Ryan LeCompte
.SequenceFile$Reader.next(SequenceFile.java:1879) ... Any ideas? Thanks, Ryan On Tue, Sep 9, 2008 at 12:36 AM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: > Hello, > > I'm attempting to use a SortedMapWritable with a LongWritable as the > key and a custom implementation of org.ap

Issue in reduce phase with SortedMapWritable and custom Writables as values

2008-09-08 Thread Ryan LeCompte
Hello, I'm attempting to use a SortedMapWritable with a LongWritable as the key and a custom implementation of org.apache.hadoop.io.Writable as the value. I notice that my program works fine when I use another primitive wrapper (e.g. Text) as the value, but fails with the following exception when

Re: Multiple output files

2008-09-06 Thread Ryan LeCompte
This clears up my concerns. Thanks! Ryan On Sep 6, 2008, at 2:17 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: On Sep 6, 2008, at 9:35 AM, Ryan LeCompte wrote: I have a question regarding multiple output files that get produced as a result of using multiple reduce tasks fo

Multiple output files

2008-09-06 Thread Ryan LeCompte
Hello, I have a question regarding multiple output files that get produced as a result of using multiple reduce tasks for a job (as opposed to only one). If I'm using a custom writable and thus writing to a sequence output, am I gauranteed that all of the day for a particular key will appear in a

Re: Multiple input files

2008-09-06 Thread Ryan LeCompte
Hi Sayali, Yes, you can submit a collection of files from HDFS as input to the job. Please take a look at the WordCount example in the Map/Reduce tutorial for an example: http://hadoop.apache.org/core/docs/r0.18.0/mapred_tutorial.html#Example%3A+WordCount+v1.0 Ryan On Sat, Sep 6, 2008 at 9:03

Re: Hadoop + Elastic Block Stores

2008-09-05 Thread Ryan LeCompte
ore micro-management but I think Tom White wrote about it and there was a link to it in another discussion you were part of. Hope this helps, J-D On Fri, Sep 5, 2008 at 7:00 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: Hello, I was wondering if anyone has gotten far at all with getti

Hadoop + Elastic Block Stores

2008-09-05 Thread Ryan LeCompte
Hello, I was wondering if anyone has gotten far at all with getting Hadoop up and running with EC2 + EBS? Any luck getting this to work in a way that the HDFS runs on the EBS so that it isn't blown away every time you bring up/down the EC2 Hadoop cluster? I'd like to experiment with this next, and

Re: Custom Writeables

2008-09-05 Thread Ryan LeCompte
Thanks!! On Sep 5, 2008, at 1:29 PM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote: On Fri, Sep 5, 2008 at 10:18 AM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: Thanks! Quick question on that particular class: why are the methods synchronized? I didn't think tha

Re: Custom Writeables

2008-09-05 Thread Ryan LeCompte
Thanks! Quick question on that particular class: why are the methods synchronized? I didn't think that key/value objects needed to be thread safe? Ryan On Sep 5, 2008, at 1:09 PM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote: Yes, it is pretty easy to compose Writables. Just have the write

Custom Writeables

2008-09-05 Thread Ryan LeCompte
Hello, Can a custom Writeable object used as a key/value contain other writeables, like MapWriteable? Thanks, Ryan

Re: EC2 AMI for Hadoop 0.18.0

2008-09-04 Thread Ryan LeCompte
Works great! My only suggestion would be to modify the /usr/local/hadoop-0.18.0/conf/hadoop-site.xml file to use "hdfs://..." for the namenode address. Otherwise I constantly get warnings saying that the syntax is deprecated any time I submit a job for execution or interact with HDFS via bin/hadoo

Re: Hadoop & EC2

2008-09-04 Thread Ryan LeCompte
Hi Tom, This clears up my questions. Thanks! Ryan On Thu, Sep 4, 2008 at 9:21 AM, Tom White <[EMAIL PROTECTED]> wrote: > On Thu, Sep 4, 2008 at 1:46 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: >> I'm noticing that using bin/hadoop fs -put ... svn://... is uploadi

Re: Hadoop & EC2

2008-09-04 Thread Ryan LeCompte
m White <[EMAIL PROTECTED]> wrote: > On Wed, Sep 3, 2008 at 3:05 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: >> Tom, >> >> I noticed that you mentioned using Amazon's new elastic block store as >> an alternative to using S3. Right now I'm testing pushing

Re: Hadoop & EC2

2008-09-03 Thread Ryan LeCompte
ple. > > This sounds very useful. Please consider creating a Jira and > submitting the code (even if it's not "finished" folks might like to > see it). Thanks. > > Tom > >> >> Cheers >> >> Tim >> >> >> >> On Tue, Se

Re: JVM Spawning

2008-09-02 Thread Ryan LeCompte
g? I guess I'd have to concatenate the files into 1 file and somehow turn off splitting? Ryan On Wed, Sep 3, 2008 at 12:09 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > > On Sep 2, 2008, at 9:00 PM, Ryan LeCompte wrote: > >> Beginner's question: >> >>

JVM Spawning

2008-09-02 Thread Ryan LeCompte
Beginner's question: If I have a cluster with a single node that has a max of 1 map/1 reduce, and the job submitted has 50 maps... Then it will process only 1 map at a time. Does that mean that it's spawning 1 new JVM for each map processed? Or re-using the same JVM when a new map can be processed

Re: Hadoop & EC2

2008-09-02 Thread Ryan LeCompte
How can you ensure that the S3 buckets and EC2 instances belong to a certain zone? Ryan On Tue, Sep 2, 2008 at 2:38 PM, Karl Anderson <[EMAIL PROTECTED]> wrote: > > On 2-Sep-08, at 5:22 AM, Ryan LeCompte wrote: > >> Hi Tim, >> >> Are you mostly just processi

Re: Error while uploading large file to S3 via Hadoop 0.18

2008-09-02 Thread Ryan LeCompte
Actually not if you're using the s3:// as opposed to s3n:// ... Thanks, Ryan On Tue, Sep 2, 2008 at 11:21 AM, James Moore <[EMAIL PROTECTED]> wrote: > On Mon, Sep 1, 2008 at 1:32 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: >> Hello, >> >> I'm tryin

Re: Hadoop & EC2

2008-09-02 Thread Ryan LeCompte
p+Sort+Combine about 130,000 jobs a > seconds (simplest of simple map operations). For these small > datasets, you might find it useful - let me know if I should spend > time finishing it (Or submit help?) - it is really very simple. > > Cheers > > Tim > > > > On

Re: Hadoop & EC2

2008-09-02 Thread Ryan LeCompte
Tue, Sep 2, 2008 at 8:44 AM, Andrew Hitchcock <[EMAIL PROTECTED]> wrote: >> Hi Ryan, >> >> Just a heads up, if you require more than the 20 node limit, Amazon >> provides a form to request a higher limit: >> >> http://www.amazon.com/gp/html-forms-controller/ec2-req

Hadoop & EC2

2008-09-01 Thread Ryan LeCompte
Hello all, I'm curious to see how many people are using EC2 to execute their Hadoop cluster and map/reduce programs, and how many are using home-grown datacenters. It seems like the 20 node limit with EC2 is a bit crippling when one wants to process many gigabytes of data. Has anyone found this to

Re: Error while uploading large file to S3 via Hadoop 0.18

2008-09-01 Thread Ryan LeCompte
ir you can find it at the jets3t web site. Make sure that > it's from the same version that your copy of Hadoop is using. > > On Mon, Sep 1, 2008 at 1:32 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: > >> Hello, >> >> I'm trying to upload a fairly large

Error while uploading large file to S3 via Hadoop 0.18

2008-09-01 Thread Ryan LeCompte
Hello, I'm trying to upload a fairly large file (18GB or so) to my AWS S3 account via bin/hadoop fs -put ... s3://... It copies for a good 15 or 20 minutes, and then eventually errors out with a failed retry attempt (saying that it can't retry since it has already written a certain number of byte

Re: Reduce hanging with custom value objects?

2008-08-30 Thread Ryan LeCompte
Nevermind, I figured it out! :) Sorry for spamming the list! For those interested, I had a stupid host/IP resolution problem which was easily fixed in /etc/hosts. :) Thanks, Ryan On Sat, Aug 30, 2008 at 3:41 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: > I see this in the syslog fo

Re: Reduce hanging with custom value objects?

2008-08-30 Thread Ryan LeCompte
print() statements in the custom writable's readFields()/write() methods and it's showing up in the stdout logs. Any ideas? Thanks, Ryan On Sat, Aug 30, 2008 at 10:32 AM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: > The job finally came back with output. Notice that I don't get these

Re: Reduce hanging with custom value objects?

2008-08-30 Thread Ryan LeCompte
sk Id : attempt_200808300858_0003_m_01_0, Status : FAILED Too many fetch-failures 08/08/30 09:28:04 WARN mapred.JobClient: Error reading task outputConnection timed out Any ideas? On Sat, Aug 30, 2008 at 10:10 AM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: > Hello all, > > I'm new to Hado

Reduce hanging with custom value objects?

2008-08-30 Thread Ryan LeCompte
Hello all, I'm new to Hadoop. I'm trying to write a small hadoop map/reduce program that instead of reading/writing the primitive LongWritable,IntWritable, etc. classes I'm using a custom object that I wrote that implements the Writable interface. I'm still using a LongWritable for the keys, but u