Re: Is Hadoop compatiable with IBM JDK 1.5 64 bit for AIX 5?

2008-07-18 Thread Colin Freas
I'm not sure if this is useful info, but I used both the Sun and the IBM JDK under Linux to run version 0.16.iForget of Hadoop, without any problems. I did some brief performance testing, didn't see any significant difference, then we switched over to the Sun JDK exclusively as per the recommendat

Re: Input/Output Formaters and FileTypes

2008-06-20 Thread Colin Freas
We'd been using text input and output exclusively, but eventually realized some efficiency improvements by using slightly more sophisticated classes specific to our application. Our main use of Hadoop is processing activity logs from a fleet of servers. We get about 6GB of compressed data per day.

Re: Stack Overflow When Running Job

2008-06-10 Thread Colin Freas
PM, Runping Qi <[EMAIL PROTECTED]> wrote: > > This is a known problem for 0.17.0: > https://issues.apache.org/jira/browse/HADOOP-3442 > > It should be fixed in 0.17.1 > > Runping > > > > -Original Message- > > From: Colin Freas [mailto:[EMAIL PROT

Simple question: call collect multiple times?

2008-06-09 Thread Colin Freas
Sorry if this is a dumb question, but in all my MR classes, I've only ever called collect once, and now I find myself wanting to call collect multiple times. Looking at the API it seems like there shouldn't be a problem with that, but I just wanted to make sure. (...and to seed Google with the an

Re: Stack Overflow When Running Job

2008-06-09 Thread Colin Freas
We were getting this exact same problem in a really simple MR job, on input produced from a known-working MR job. It seemed to happen intermittently, and we couldn't figure out what was up. In the end we solved the problem by increasing the number of maps (80 to 200, this is a 6 node, 12 code clus

Re: Hadoop Distributed Virtualisation

2008-06-06 Thread Colin Freas
The MR jobs I'm performing are not CPU intensive, so I've always assumed that they're more IO bound. Maybe that's an exceptional situation, but I'm not really sure. A good motherboard with a local IO channel per disk, feeding individual cores, with memory partitioned up between them... and I've

Re: Hadoop Distributed Virtualisation

2008-06-06 Thread Colin Freas
I've wondered about this using single or dual quad-core machines with one spindle per core, and partitioning them out into 2, 4, 8, whatever virtual machines, possibly marking each physical box as a "rack". There would be some initial and ongoing sysadmin costs. But could this increase thoughput

primary namenode not starting

2008-05-09 Thread Colin Freas
The primary namenode on my cluster seems to have stopped working. The secondary name node starts, but the primary fails with the error message below. I've scoured the cluster, particularly this node for changes, but I haven't found any that I believe would cause this problem. If anyone has an id

hdfs "injection" node?

2008-04-16 Thread Colin Freas
I have a machine that stores a lot of the data I need to put into my cluster's HDFS. It's on the same private network as the nodes, but it isn't a node itself. What is the easiest way to have it be able to directly inject the data files into HDFS, without it acting as a datanode for replicas? I

changing master node?

2008-04-14 Thread Colin Freas
i changed the master node on my cluster because the original crashed hard. my nodes share an nfs mounted /conf. i changed all the ip's appropriately, starting and stopping seems to work fine. when i do a bin/hadoop dfs -ls i get this message repeating itself over and over: 08/04/14 06:01:10 INF

Re: Formatting the file system: Misleading hint in Wiki?

2008-04-10 Thread Colin Freas
This has been my experience as well. This should be mentioned in the Getting Started pages until resolved. -colin On Thu, Apr 10, 2008 at 10:54 AM, Michaela Buergle < [EMAIL PROTECTED]> wrote: > Hi all, > on http://wiki.apache.org/hadoop/GettingStartedWithHadoop - it says: > "Do not format a

Re: "incorrect data check

2008-04-09 Thread Colin Freas
? Run this task along with the identity reducer, and > you should be able to identify pretty quickly if there's HDFS corruption > issue. > > Norbert > > On Tue, Apr 8, 2008 at 5:50 PM, Colin Freas <[EMAIL PROTECTED]> wrote: > > > so, in an attempt to track do

Re: "incorrect data check

2008-04-08 Thread Colin Freas
trating because it's causing my jobs to fail, rather than skipping the problematic input files. i've also looked through the conf file and don't see anything similar about skipping bad files without killing the job. -colin On Tue, Apr 8, 2008 at 11:53 AM, Colin Freas <[EMAIL PROTE

"incorrect data check

2008-04-08 Thread Colin Freas
running a job on my 5 node cluster, i get these intermittent exceptions in my logs: java.io.IOException: incorrect data check at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method) at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress

Re: on number of input files and split size

2008-04-06 Thread Colin Freas
i just wanted to reiterate ted's point here. my first run through with hadoop i used our log files as there are, which are designed as small input files for a mysql database instance. the files were at most a few megabytes in size. and we had tens something like 10,000 of them. performance was

Performance impact of underlying file system?

2008-04-01 Thread Colin Freas
Is the performance of Hadoop impacted by the underlying file system on the nodes at all? All my nodes are ext3. I'm wondering if using XFS, Reiser, or ZFS might improve performance. Does anyone have any offhand knowledge about this? -Colin

Re: reduce task hanging or just slow?

2008-03-31 Thread Colin Freas
I believe that this is exactly what happened. I'm not sure exactly what happened, but the networking stack on the master node was all screwed up somehow. All the machines serve double duty as development boxes, and they're on two different networks. The master node could contact the cluster netw

Re: Hadoop streaming performance problem

2008-03-31 Thread Colin Freas
Really? I would expect the opposite: for compressed files to process slower. You're saying that is not the case, and that compressed files actually increase the speed of jobs? -Colin On Mon, Mar 31, 2008 at 4:51 PM, Andreas Kostyrka <[EMAIL PROTECTED]> wrote: > Well, on our EC2/HDFS-on-S3 clus

reduce task hanging or just slow?

2008-03-31 Thread Colin Freas
I've set up a job to run on my small 4 (sometimes 5) node cluster on dual processor server boxes with 2-8GB of memory. My job processes 24 100-300MB files that are a days worth of logs, total data is about 6GB. I've modified the word count example to do what I need, and it works fine on small tes

nfs mount hadoop-site?

2008-03-27 Thread Colin Freas
are there any issues with having the hadoop-site.xml in .../conf placed on an nfs mounted dir that all my nodes have access to? -colin

Re: MapReduce with related data from disparate files

2008-03-25 Thread Colin Freas
ap method, you collect different value objects from > different RecordReaders. In you reduce method, for each key, you do > necessary processing on the collection based on the value object types. > > The main point here is to keep track of the differences from the > beginning to the

MapReduce with related data from disparate files

2008-03-24 Thread Colin Freas
I have a cluster of 5 machines up and accepting jobs, and I'm trying to work out how to design my first MapReduce task for the data I have. So, I wonder if anyone has any experience with the sort of problem I'm trying to solve, and what the best ways to use Hadoop and MapReduce for it are. I have

Re: Master as DataNode

2008-03-21 Thread Colin Freas
; > > -Original Message- > > From: Colin Freas [mailto:[EMAIL PROTECTED] > > Sent: Friday, March 21, 2008 11:18 AM > > To: core-user@hadoop.apache.org > > Subject: Re: Master as DataNode > > > > ah: > > > > 2008-03-21 14:06:05,526 ERROR org.

Re: Master as DataNode

2008-03-21 Thread Colin Freas
Jeff > > > -Original Message- > > From: Colin Freas [mailto:[EMAIL PROTECTED] > > Sent: Friday, March 21, 2008 10:40 AM > > To: core-user@hadoop.apache.org > > Subject: Master as DataNode > > > > setting up a simple hadoop cluster with two machines,

Master as DataNode

2008-03-21 Thread Colin Freas
setting up a simple hadoop cluster with two machines, i've gotten to the point where the two machines can see each other, things seem fine, but i'm trying to set up the master as both a master and a slave, just for testing purposes. so, i've put the master into the conf/masters file and the conf/s

Re: NFS mounted home, host RSA keys, localhost, strict sshds and bad mojo.

2008-03-21 Thread Colin Freas
ah, yes. that worked. thanks! On Fri, Mar 21, 2008 at 12:48 PM, Natarajan, Senthil <[EMAIL PROTECTED]> wrote: > I guess the following file might have localhost entry, change to hostname > > /conf/masters > /conf/slaves > > > -Original Message- > Fr

NFS mounted home, host RSA keys, localhost, strict sshds and bad mojo.

2008-03-21 Thread Colin Freas
i'm working to set up a cluster across several machines where users' home dirs are on an nfs mount. i setup key authentication for the hadoop user, install all the software on one node, get everything running, and move on to another node. once there, however, my sshd complains because the host ke