Re: MapReduce jobs with expensive initialization

2009-03-02 Thread Tom White
On any particular tasktracker slot, task JVMs are shared only between tasks of the same job. When the job is complete the task JVM will go away. So there is certainly no sharing between jobs. I believe the static singleton approach outlined by Scott will work since the map classes are in a single

Re: What's the cause of this Exception

2009-03-02 Thread Nick Cen
Hi, Just to provide more info. By setting the mapred.job.tracker to local which make the program run locally, everything works fine. but turn to fully cluster the exception comes. 2009/3/2 Nick Cen cenyo...@gmail.com Hi, I have set the seperator value, but the same exception is thrown. As i

Re: Eclipse plugin

2009-03-02 Thread Arijit Mukherjee
Hi All I've having some trouble in using the eclipse plugin on a windows XP machine to connect to the HDFS (hadoop 0.19.0) on a linux server - I'm getting the error:null message, although the port number etc are correct. Can this be related to the user information? I've set it to the hadoop user

Announcing CloudBase-1.2.1 release

2009-03-02 Thread Tarandeep Singh
Hi, We have just released 1.2.1 version of CloudBase on sourceforge- http://cloudbase.sourceforge.net [ CloudBase is a data warehouse system built on top of Hadoop's Map-Reduce architecture. It uses ANSI SQL as its query language and comes with a JDBC driver. It is developed by Business.com and

Re: MapReduce jobs with expensive initialization

2009-03-02 Thread Owen O'Malley
On Mar 2, 2009, at 3:03 AM, Tom White wrote: I believe the static singleton approach outlined by Scott will work since the map classes are in a single classloader (but I haven't actually tried this). Even easier, you should just be able to do it with static initialization in the Mapper

Re: [ANNOUNCE] Hadoop release 0.19.1 available

2009-03-02 Thread Aviad sela
Nigel Thanks, I have extracted the new project. However, I am having problems building the project I am using Eclipse 3.4 and ant 1.7 I recieve error compiling core classes * compile-core-classes*: BUILD FAILED jsp-compile uriroot=${src.webapps}/task outputdir=${build.src}

RE: Potential race condition (Hadoop 18.3)

2009-03-02 Thread Koji Noguchi
Ryan, If you're using getOutputPath, try replacing it with getWorkOutputPath. http://hadoop.apache.org/core/docs/r0.18.3/api/org/apache/hadoop/mapred/ FileOutputFormat.html#getWorkOutputPath(org.apache.hadoop.mapred.JobConf ) Koji -Original Message- From: Ryan Shih

Ant Build fails on Eclipse 3.4 and Ant 1.7 (Windows)

2009-03-02 Thread Aviad sela
I am having problems building the project for release 0.19.1 I am using Eclipse 3.4 and ant 1.7 I recieve error compiling core classes * compile-core-classes*: BUILD FAILED *D:\Work\AviadWork\workspace\cur\WSAD\Hadoop_Core_19_1\Hadoop\build.xml:302: java.lang.ExceptionInInitializerError* **

RE: Potential race condition (Hadoop 18.3)

2009-03-02 Thread Malcolm Matalka
I have a situation which may be related. I am running hadoop 0.18.1. I am on a cluster with 5 machines and testing on very small input of 10 lines. Mapper produces either 1 or 0 output per line of input yet somehow I get 18 lines of output from the reducer. For example I have one input where

Re: Shuffle speed?

2009-03-02 Thread hc busy
There are a few things that caused this to happen to me earlier on. Make sure to check that it actually makes progress. Sometimes, slowness is result of negative progress: it gets to say 10% complete on reduce, and then drop back down to 5%...In that case the output can output that line with the

Re: Potential race condition (Hadoop 18.3)

2009-03-02 Thread Ryan Shih
Koji - That makes a lot of sense. The two tasks are probably stepping over each other. I'll give it a try and let you know how it goes. Malcolm - if you turned off speculative execution and are still getting the problem, it doesn't sound the same. Do you want to do a cutpaste of your reduce code

Issues installing FUSE_DFS

2009-03-02 Thread Hyatt, Matthew G
When we try to mount the dfs from fuse we are getting the following errors. Has anyone seen this issues in the past? This is on version 0.19.0 [r...@socdvmhdfs1]# fuse_dfs dfs://socdvmhdfs1:9000 /hdfs port=9000,server=socdvmhdfs1 fuse-dfs didn't recognize /hdfs,-2 [r...@socdvmhdfs1]# df -h

Re: Issues installing FUSE_DFS

2009-03-02 Thread Brian Bockelman
Hey Matthew, We use the following command on 0.19.0: fuse_dfs -oserver=hadoop-name -oport=9000 /mnt/hadoop -oallow_other - ordbufffer=131072 Brian On Mar 2, 2009, at 4:12 PM, Hyatt, Matthew G wrote: When we try to mount the dfs from fuse we are getting the following errors. Has anyone

RE: Potential race condition (Hadoop 18.3)

2009-03-02 Thread Malcolm Matalka
Sure. Note: I am using my own class for keys and values in this. The key is called StringArrayWritable and it implements WritableComparable. The value is called AggregateRecord and it implements Writable. I have done some debugging and here is what I have found: While running in local mode I

Re: Potential race condition (Hadoop 18.3)

2009-03-02 Thread Ryan Shih
I'm not sure what your accum.extend(m) does, but in 18, the value records are reused (rather than in a previous version where a new copy was made). So if you are storing a reference to your values, note that they all are going to point to the same thing unless you make a copy of it. Try:

RE: Potential race condition (Hadoop 18.3)

2009-03-02 Thread Malcolm Matalka
Ryan, Thanks a lot. I need to do some more investigation but I believe that solved my problem. One question though. Should my Combine Input Records always be less than or equal to my Map output records? I appear to be seeing a Combine Input amount larger than my Map output amount. Thanks

Re: OutOfMemory error processing large amounts of gz files

2009-03-02 Thread bzheng
Thanks for all the info. Upon further investigation, we are dealing with two separate issues: 1. problem processing a lot of gz files we have tried the hadoop.native.lib setting and it makes little difference. however, this is not that big a deal since we can use multiple jobs each

Jobs run slower and slower

2009-03-02 Thread Sean Laurent
Hi all, I'm conducting some initial tests with Hadoop to better understand how well it will handle and scale with some of our specific problems. As a result, I've written some M/R jobs that are representative of the work we want to do. I then run the jobs multiple times in a row (sequentially) to

Re: OutOfMemory error processing large amounts of gz files

2009-03-02 Thread Runping Qi
Your job tracker out-of-memory problem may be related to https://issues.apache.org/jira/browse/HADOOP-4766 Runping On Mon, Mar 2, 2009 at 4:29 PM, bzheng bing.zh...@gmail.com wrote: Thanks for all the info. Upon further investigation, we are dealing with two separate issues: 1. problem

Re: Potential race condition (Hadoop 18.3)

2009-03-02 Thread Ryan Shih
Koji - That looks like it did the trick - we're smooth sailing now. Thanks a lot! On Mon, Mar 2, 2009 at 2:02 PM, Ryan Shih ryan.s...@gmail.com wrote: Koji - That makes a lot of sense. The two tasks are probably stepping over each other. I'll give it a try and let you know how it goes.

question about released version id

2009-03-02 Thread 鞠適存
hi, I wonder how to make the hadoop version number. The HowToRelease page on the hadoop web site just describes the process about new release but not mentions the rules on assigning the version number. Are there any criteria for version number? For example,under what condition the next version of