Doubt regarding permissions

2009-04-13 Thread Amar Kamat
Hey, I tried the following : - created a dir temp for user A and permission 733 - created a dir temp/test for user B and permission 722 - - created a file temp/test/test.txt for user B and permission 722 Now in HDFS, user A can list as well as read the contents of

Re: Reduce task attempt retry strategy

2009-04-06 Thread Amar Kamat
Stefan Will wrote: Hi, I had a flaky machine the other day that was still accepting jobs and sending heartbeats, but caused all reduce task attempts to fail. This in turn caused the whole job to fail because the same reduce task was retried 3 times on that particular machine. What is your

Re: Job tracker not responding during streaming job

2009-04-06 Thread Amar Kamat
David Kellogg wrote: I am running Hadoop streaming. After around 42 jobs on an 18-node cluster, the jobtracker stops responding. This happens on normally-working code. Here are the symptoms. 1. A job is running, but it pauses with reduce stuck at XX% 2. hadoop job -list hangs or takes a very

Re: reduce task failing after 24 hours waiting

2009-03-25 Thread Amar Kamat
Amareshwari Sriramadasu wrote: Set mapred.jobtracker.retirejob.interval This is used to retire completed jobs. and mapred.userlog.retain.hours to higher value. This is used to discard user logs. By default, their values are 24 hours. These might be the reason for failure, though I'm not

Re: reduce task failing after 24 hours waiting

2009-03-25 Thread Amar Kamat
Amar Kamat wrote: Amareshwari Sriramadasu wrote: Set mapred.jobtracker.retirejob.interval This is used to retire completed jobs. and mapred.userlog.retain.hours to higher value. This is used to discard user logs. As Amareshwari pointed out, this might be the cause. Can you increase

Re: Reducer hangs at 16%

2009-02-23 Thread Amar Kamat
Looks like the reducer is able to fetch map output files from the local box but fails to fetch it from the remote box. Can you check if there is no firewall issue or /etc/hosts entries are correct? Amar Jagadesh_Doddi wrote: Hi I have changed the configuration to run Name node and job tracker

Re: Hadoop setup questions

2009-02-11 Thread Amar Kamat
bjday wrote: Good morning everyone, I have a question about correct setup for hadoop. I have 14 Dell computers in a lab. Each connected to the internet and each independent of each other. All run CentOS. Logins are handled by NIS. If userA logs into the master and starts the daemons

Re: TaskTrackers being double counted after restart job recovery

2009-02-09 Thread Amar Kamat
Stefan Will wrote: Hi, I¹m using the new persistent job state feature in 0.19.0, and it¹s worked really well so far. However, this morning my JobTracker died with and OOM error (even though the heap size is set to 768M). So I killed it and all the TaskTrackers. Any specific reason why you

Re: what's going on :( ?

2009-02-09 Thread Amar Kamat
Mark Kerzner wrote: Hi, Hi, why is hadoop suddenly telling me Retrying connect to server: localhost/127.0.0.1:8020 with this configuration configuration property namefs.default.name/name valuehdfs://localhost:9000/value /property property namemapred.job.tracker/name

Re: After hadoop jobtracker runs for a long time, jobtracker may comsume 100% cpu (one cpu core is fully consume), why?

2009-02-08 Thread Amar Kamat
Ruyue Ma wrote: Our hadoop version is checked out from truck version at 2008.10.22. When this problem appeared, our cluster didn't have any job running. The cluster was idle! How many jobs (total) were submitted to the cluster? What is the average size of each job? Have you changed any

Re: lost TaskTrackers

2009-02-08 Thread Amar Kamat
Vadim Zaliva wrote: Hi! I am observing strange situation in my Hadoop cluster. While running task, eventually it gets into this strange mode where: 1. JobTracker reports 0 task trackers. 2. Task tracker processes are alive but log file is full of repeating messages like this: 2009-02-08

Re: Is it possible to submit job to JobClient and exit immediately?

2009-01-15 Thread Amar Kamat
Andrew wrote: For now, I use such code blocks in all my MR jobs: try { JobClient.runJob(job); JobClient jc = new JobClient(job); jc.submitJob(job); // submits a job and comes out } catch (IOException exc) { LOG.info(Job failed, exc); } System.exit(0); But this code

RE: Job Tracker/Name Node redundancy

2009-01-09 Thread Amar Kamat
Ryan, From the MR (JobTracker) side we have a failover support. If a large job is submitted and the JobTracker fails midway then you can start the JobTracker on the same host and resume the job. Look at https://issues.apache.org/jira/browse/HADOOP-3245 for more details. Hope that helps. Amar

RE: Hadoop Internal Architecture writeup

2008-11-30 Thread Amar Kamat
Hey, nice work and nice writeup. Keep it up. Comments inline. Amar -Original Message- From: Ricky Ho [mailto:[EMAIL PROTECTED] Sent: Fri 11/28/2008 9:45 AM To: core-user@hadoop.apache.org Subject: RE: Hadoop Internal Architecture writeup Amar, thanks a lot. This is exactly the kind of

Re: Hadoop Internal Architecture writeup

2008-11-27 Thread Amar Kamat
Ricky Ho wrote: I put together an article describing the internal architecture of Hadoop (HDFS, MapRed). I'd love to get some feedback if you see anything inaccurate or missing ... http://horicky.blogspot.com/2008/11/hadoop-mapreduce-implementation.html Few comments on MR : 1) The

Re: How to retrieve rack ID of a datanode

2008-11-25 Thread Amar Kamat
Ramya R wrote: Hi all, I want to retrieve the Rack ID of every datanode. How can I do this? I tried using getNetworkLocation() in org.apache.hadoop.hdfs.protocol.DatanodeInfo. I am getting /default-rack as the output for all datanodes. Have you setup the cluster to be rack-aware?

Re: reading input for a map function from 2 different files?

2008-11-11 Thread Amar Kamat
the standard deviation offline. So avg = B / N = 10/4 = 2.5 Hence the std deviation would be sqrt( (A - N * avg^2) / N) = sqrt ((30 - 4*6.25)/4) = *1.11803399 *Using the main formula the answer is *1.11803399* Amar On Mon, Nov 10, 2008 at 4:22 AM, Amar Kamat [EMAIL PROTECTED] wrote: Amar Kamat wrote

Re: Best way to handle namespace host failures

2008-11-10 Thread Amar Kamat
Goel, Ankur wrote: Hi Folks, I am looking for some advice on some the ways / techniques that people are using to get around namenode failures (Both disk and host). We have a small cluster with several job scheduled for periodic execution on the same host where name server runs.

Re: reduce more than one way

2008-11-09 Thread Amar Kamat
Elia Mazzawi wrote: Hello, I'm writing hadoop programs in Java, I have 2 hadooop map/reduce programs that have the same map, but a different reduce methods. Look how MultipleOutputFormat is used. This provides the facility to write to multiple files. Amar can i run them in a way so that

Re: reading input for a map function from 2 different files?

2008-11-09 Thread Amar Kamat
some speed wrote: I was wondering if it was possible to read the input for a map function from 2 different files: 1st file --- user-input file from a particular location(path) 2nd file=--- A resultant file (has just one key,value pair) from a previous MapReduce job. (I am implementing a

Re: reduce task progress above 100%?

2008-09-16 Thread Amar Kamat
Prasad Pingali wrote: I am using 0.18.1-dev, upgraded from 0.18.0. I am also using compression for map outputs. I think this is fixed in 19. Look here https://issues.apache.org/jira/browse/HADOOP-3131. We see this with compression turned ON. Amar - Prasad. On Wednesday 17 September 2008

Re: Get the pairs of all row key combinations w/o repetition

2008-08-13 Thread Amar Kamat
Edward J. Yoon wrote: Hi communities, Do you have any idea how to get the pairs of all row key combinations w/o repetition on Map/Reduce as describe below? Input : (MapFile or Hbase Table) Key1, Value or RowResult Key2, Value or RowResult Key3, Value or RowResult Key4, Value or RowResult

Re: Get the pairs of all row key combinations w/o repetition

2008-08-13 Thread Amar Kamat
Edward J. Yoon wrote: Hi communities, Do you have any idea how to get the pairs of all row key combinations w/o repetition on Map/Reduce as describe below? Input : (MapFile or Hbase Table) Key1, Value or RowResult Key2, Value or RowResult Key3, Value or RowResult Key4, Value or RowResult

Re: Get the pairs of all row key combinations w/o repetition

2008-08-13 Thread Amar Kamat
Amar Kamat wrote: Edward J. Yoon wrote: Hi communities, Do you have any idea how to get the pairs of all row key combinations w/o repetition on Map/Reduce as describe below? Input : (MapFile or Hbase Table) Key1, Value or RowResult Key2, Value or RowResult Key3, Value or RowResult Key4

Re: Why does mapred.tasktracker.expiry.interval default to 10 mins ?

2008-07-30 Thread Amar Kamat
Pratyush Banerjee wrote: Hi All, I have been using hadoop on a 50 machine cluster for sometime now and just wondered why the mapred.tasktracker.expiry.interval defaulted to 10 minutes. If I want to reduce it to 1 min i.e. 6000 msec, should that cause any problems.

Re: Finished or not?

2008-07-16 Thread Amar Kamat
I have seen the opposite case where the maps are shown as 100% done while there are still some maps running. I have seen this on trunk and there were some failed/killed tasks. Amar Andreas Kostyrka wrote: On Wednesday 09 July 2008 05:56:28 Amar Kamat wrote: Andreas Kostyrka wrote

Re: Is there a way to preempt the initial set of reduce tasks?

2008-07-16 Thread Amar Kamat
I think the JobTracker can easily detect this. The case where a high priority job is starved as there are no slots/resources. Preemption should probably kick in where tasks from a low priority job might get scheduled even though the high priority job has some tasks to run. Amar Goel, Ankur

Re: Is there a way to preempt the initial set of reduce tasks?

2008-07-16 Thread Amar Kamat
of the reducers taking up the reduce slots. @Ankur/Murli, Plz open a jira if you guys feel its important. Amar Murali, can you try this if it works ! -Original Message- From: Amar Kamat [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 16, 2008 8:01 PM To: core-user@hadoop.apache.org

Re: scaling issue, please help

2008-07-03 Thread Amar Kamat
(mapred.max.map.failures.percent/mapred.max.reduce.failures.percent : default is 0) then the job is considered failed. Amar On Jul 1, 2008, at 10:06 PM, Amar Kamat wrote: Mori Bellamy wrote: hey all, i've got a mapreduce task that works on small (~1G) input. when i try to run the same task on large

Re: failed map tasks

2008-07-03 Thread Amar Kamat
jerrro wrote: Hello, I was wondering - could someone tell me what are the reasons that I could get failure with certain map tasks on a node? Well, that depends on the kind of errors you are seeing. Could you plz post the logs/error messages? Amar Any idea that comes to mind would work (it

Re: scaling issue, please help

2008-07-01 Thread Amar Kamat
Mori Bellamy wrote: hey all, i've got a mapreduce task that works on small (~1G) input. when i try to run the same task on large (~100G) input, i get the following error around when the map tasks are almost done (~98%) 2008-07-01 13:10:59,231 INFO org.apache.hadoop.mapred.ReduceTask:

Re: Too many fetch failures AND Shuffle error

2008-06-30 Thread Amar Kamat
= http://'$tthost':'$port'/mapOutput?job='$jobid'map='$mapid'reduce='$reduce-partition-id' '$var' is what you have to substitute. Amar Thanks, Taran On Thu, Jun 19, 2008 at 11:43 PM, Amar Kamat [EMAIL PROTECTED] wrote: Yeah. With 2 nodes the reducers will go up to 16% because the reducer

Re: How to configure RandomWriter to generate less amount of data

2008-06-30 Thread Amar Kamat
Heshan Lin wrote: Hi, I'm trying to configure RandomWriter to generate less data than does the default configuration. bin/hadoop jar hadoop-*-examples.jar randomwriter -Dtest.randomwrite.bytes_per_map=value -Dtest.randomwrite.total_bytes=value -Dtest.randomwriter.maps_per_host=value

Re: How Mappers function and solultion for my input file problem?

2008-06-24 Thread Amar Kamat
Xuan Dzung Doan wrote: Hi, I'm a Hadoop newbie. My question is as follows: The level of parallelism of a job, with respect to mappers, is largely the number of map tasks spawned, which is equal to the number of InputSplits. But within each InputSplit, there may be many records (many input

Re: Too many fetch failures AND Shuffle error

2008-06-20 Thread Amar Kamat
Yeah. With 2 nodes the reducers will go up to 16% because the reducer are able to fetch maps from the same machine (locally) but fails to copy it from the remote machine. A common reason in such cases is the *restricted machine access* (firewall etc). The web-server on a machine/node hosts map

Re: Too many fetch failures AND Shuffle error

2008-06-19 Thread Amar Kamat
Sayali Kulkarni wrote: Hello, I have been getting Too many fetch failures (in the map operation) and shuffle error (in the reduce operation) Can you post the reducer logs. How many nodes are there in the cluster? Are you seeing this for all the maps and reducers? Are the reducers

Re: Why is there a seperate map and reduce task capacity?

2008-06-16 Thread Amar Kamat
Daniel Leffel wrote: Why not just combine them? How do I do that? Consider a case where the cluster (of n nodes) is configured to process just one task per node. Let there be (n-1) reducers. Lets assume that the map phase is complete and the reducers are shuffling. There will be (n-1)

Re: Failed Reduce Task

2008-06-15 Thread Amar Kamat
Looks like the reduce task is not able to fetch the map output from the other machine. My guess is that the reduce task is able to pull data from the same machine making the progress upto 16% but fails to get the data from the other machine. This could be a firewall issue. Is it possible for

Re: Master Failure

2008-05-19 Thread Amar Kamat
Fabrizio detto Mario wrote: How does Hadoop manage the failure of the JobTracker (Master Node)? For example, Google Map/Reduce version aborts the MapReduce computation if the master fails. Currently there is no recovery/backup strategy inplace to take care of this. We are currently working

Re: Master Node in Hadoop Map/Reduce Implementation.

2008-05-15 Thread Amar Kamat
Fabrizio detto Mario wrote: Hello Hadoop community, I read about Hadoop framework ( http://hadoop.apache.org/core/docs/r0.16.3/mapred_tutorial.html) this phrase: The Map-Reduce framework consists of a single master JobTracker and one slave TaskTracker per cluster-node... Is The Job Tracker

Re: Master Node in Hadoop Map/Reduce Implementation.

2008-05-15 Thread Amar Kamat
Amar Kamat wrote: Fabrizio detto Mario wrote: Hello Hadoop community, I read about Hadoop framework ( http://hadoop.apache.org/core/docs/r0.16.3/mapred_tutorial.html) this phrase: The Map-Reduce framework consists of a single master JobTracker and one slave TaskTracker per cluster-node

Re: Need Help

2008-05-12 Thread Amar Kamat
hemal patel wrote: Hello , Can u help me to solve this problem.. When I am trying to run this program it give me error like this. bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' 08/05/12 17:32:59 INFO mapred.FileInputFormat: Total input paths to process : 12

Re: [Reduce task stalls] Problem Detailed Report

2008-05-09 Thread Amar Kamat
From the logs it looks like the reducer is able to fetch the data from the slave on the master node ('cse' machine) but is not able to fetch it from the other node ('mtech' machine here). The 16% shown in the reducer is fetched from the local machine. It seems like the jetty on the 'mtech'

Re: Can reducer output multiple files?

2008-05-08 Thread Amar Kamat
Jeremy Chow wrote: Hi list, I want to output my reduced results into several files according to some types the results blongs to. How can I implement this? There was a similar query earlier. The reply is here [http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/[EMAIL

Re: Collecting output not to file

2008-05-07 Thread Amar Kamat
Derek Shaw wrote: Hey, From the examples that I have seen thus far, all of the results from the reduce function are being written to a file. Instead of writing results to a file, I want to store them What do you mean by store and inspect? and inspect them after the job is completed. (I

Re: Getting jobTracker startTime from the JobClient

2008-04-30 Thread Amar Kamat
It can be made a part of ClusterStatus. Amar Devaraj Das wrote: No, currently, there is no way to get that from the JobClient. Yes, please submit a patch. -Original Message- From: Pete Wyckoff [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 30, 2008 8:21 AM To:

Re: Job.jar could only be replicated to 0 nodes, instead of 1(IO Exception)

2008-04-29 Thread Amar Kamat
Sridhar Raman wrote: I am trying to run K-Means using Hadoop. I first wanted to test it within a single-node cluster. And this was the error I got. What could be the problem? $ bin/hadoop jar clustering.jar com.company.analytics.clustering.mr.core.KMeansDriver Iteration 0

Re: Reg: How to pass the output path argument to the mapper

2008-04-29 Thread Amar Kamat
You can override the configure() method in the map class to get the output filename. Use FileOutputFormat.getOutputPath(conf) to get the output filename. This will work for 0.17 and later. For earlier versions you can use conf.getOutputPath(). Amar chaitanya krishna wrote: Hi, Is there any

Re: reducer outofmemoryerror

2008-04-23 Thread Amar Kamat
Apurva Jadhav wrote: Hi, I have a 4 node hadoop 0.15.3 cluster. I am using the default config files. I am running a map reduce job to process 40 GB log data. How many maps and reducers are there? Make sure that there are sufficient number of reducers. Look at conf/hadoop-default.xml (see

Re: Getting map ouput as final output by setting number of reduce to zero

2008-04-22 Thread Amar Kamat
Vibhooti Verma wrote: Has any one tried setting number of reduce to zero and getting map's output as the final output? Look at the RandomWriter example (src/examples/org/apache/hadoop/examples/RandomWriter.java). Amar I tried doing the same but my map output does not come to specified

Re: Interleaving maps/reduces from multiple jobs on the same tasktracker

2008-04-21 Thread Amar Kamat
Amar Kamat wrote: Jiaqi Tan wrote: Hi, Will Hadoop ever interleave multiple maps/reduces from different jobs on the same tasktracker? No. Suppose I have 2 jobs submitted to a jobtracker, one after the other. Must all maps/reduces from the first submitted job be completed before

Re: Map reduce classes

2008-04-17 Thread Amar Kamat
list). On 4/16/08 9:04 PM, Amar Kamat [EMAIL PROTECTED] wrote: Ted Dunning wrote: The easiest solution is to not worry too much about running an extra MR step. So, - run a first pass to get the counts. Use word count as the pattern. Store the results in a file. - run the second pass

Re: MapReduce: Two Reduce Tasks

2008-04-16 Thread Amar Kamat
Chaman Singh Verma wrote: Hello, I think the question was slightly misinterpreted. What I meant by 3-4 different task is that there are 3 different Reduce functionalities ( each reduce funtionalities could be done by many task slaves, may be 100). I want to reuse the output of Map for different

Re: Two output reduce for the same map

2008-04-16 Thread Amar Kamat
Earlier someone asked a similar question. See http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/[EMAIL PROTECTED] for the reply. I dont think that the framework directly supports this. Amar Vibhooti Verma wrote: Can we configure two(multiple) reduce for the same map, so

Re: Map reduce classes

2008-04-16 Thread Amar Kamat
. Thanks, On Wed, Apr 16, 2008 at 8:33 AM, Amar Kamat [EMAIL PROTECTED] wrote: Aayush Garg wrote: Hi, Are you sure that another MR is required for eliminating some rows? Can't I just somehow eliminate from main() when I know the keys which are needed

Re: incremental re-execution

2008-04-16 Thread Amar Kamat
Shirley Cohen wrote: Dear Hadoop Users, I'm writing to find out what you think about being able to incrementally re-execute a map reduce job. My understanding is that the current framework doesn't support it and I'd like to know whether, in your opinion, having this capability could help to

Re: Reading Configuration File

2008-04-15 Thread Amar Kamat
Natarajan, Senthil wrote: Hi, How to read configuration file in Hadoop. I tried by copying the file in HDFS and also placing within the jar file. Do you intend to read the job's config file or a separate file? In case of accessing the job specific config, overload the configure(JobConf)

Re: [HADOOP-users] HowTo filter files for a Map/Reduce task over the same input folder

2008-04-11 Thread Amar Kamat
One way to do this is to write your own (file) input format. See src/java/org/apache/hadoop/mapred/FileInputFormat.java. You need to override listPaths() in order to have selectivity amongst the files in the input folder. Amar Alfonso Olias Sanz wrote: Hi I have a general purpose input folder

Re: [HADOOP-users] HowTo filter files for a Map/Reduce task over the same input folder

2008-04-11 Thread Amar Kamat
A simpler way is to use FileInputFormat.setInputPathFilter(JobConf, PathFilter). Look at org.apache.hadoop.fs.PathFilter for details on PathFilter interface. Amar Alfonso Olias Sanz wrote: Hi I have a general purpose input folder that it is used as input in a Map/Reduce task. That folder

Re: Run job not from namenode

2008-04-01 Thread Amar Kamat
Andrey Pankov wrote: Hi all, Currently I'm able to run map-reduce jobs from box where NameNode and JobTracker are running. But I'd like to run my jobs from separate box, from which I have access to HDFS. I have updated params fs.default.name and mapred.job.tracker in local hadoop dir to

Re: Reduce Hangs

2008-03-27 Thread Amar Kamat
On Thu, 27 Mar 2008, Natarajan, Senthil wrote: Hi, I have small Hadoop cluster, one master and three slaves. When I try the example wordcount on one of our log file (size ~350 MB) Map runs fine but reduce always hangs (sometime around 19%,60% ...) after very long time it finishes. I am

Re: Hadoop: Multiple map reduce or some better way

2008-03-26 Thread Amar Kamat
On Wed, 26 Mar 2008, Aayush Garg wrote: HI, I am developing the simple inverted index program frm the hadoop. My map function has the output: word, doc and the reducer has: word, list(docs) Now I want to use one more mapreduce to remove stop and scrub words from Use distributed cache as

Re: Sharing Hadoop slave nodes between multiple masters?

2008-03-25 Thread Amar Kamat
On Tue, 25 Mar 2008, Nate Carlson wrote: Is it possible to have a single slave process jobs for multiple masters? There are two types of slaves and 2 corresponding masters in Hadoop. The 2 masters are Namenode and JobTracker while the slaves are datanodes and tasktrackers resp. Each slave when

Re: One Simple Question About Hadoop DFS

2008-03-23 Thread Amar Kamat
On Sun, 23 Mar 2008, Chaman Singh Verma wrote: Hello, I am exploring Hadoop and MapReduce and I have one very simple question. I have 500GB dataset on my local disk and I have written both Map-Reduce functions. Now how should I start ? 1. I copy the data from local disk to DFS. I have

Re: File size and number of files considerations

2008-03-10 Thread Amar Kamat
On Mon, 10 Mar 2008, Naama Kraus wrote: Hi, In our system, we plan to upload data into Hadoop from external sources and use it later on for analysis tasks. The interface to the external repositories allows us to fetch pieces of data in chunks. E.g. get n records at a time. Records are

Re: MapReduce failure

2008-03-09 Thread Amar Kamat
What is the heap size you are using for your tasks? Check 'mapred.child.java.opts' in your hadoop-default.xml. Try increasing it. This will happen if you try running the random-writer + sort examples with default parameters. The maps are not able to spill the data to the disk. Btw what version

Re: How jobs are copied to other nodes?

2008-03-08 Thread Amar Kamat
The job file i.e job.jar is copied to the dfs by the job client. When the task tracker prepares for new task it makes a local copy of the job. On Sat, 8 Mar 2008, Ben Kucinich wrote: I am interested to know the internal working of Hadoop regarding distribution of jobs. How are the jobs copied

Re: Map/Reduce Type Mismatch error

2008-03-08 Thread Amar Kamat
Look at WordCount.java in src/examples/org/apache/hadoop/examples. Whether you need a new InputFormat depends on what you want to do. Amar On Fri, 7 Mar 2008, Prasan Ary wrote: Hi All, I am running a Map/Reduce on a textfile. Map takes Text,Text as (key,value) input pair , and outputs

Re: Custom Input Formats

2008-03-08 Thread Amar Kamat
On Fri, 7 Mar 2008, Dan Tamowski wrote: Hello, First, I am currently subscribed to the digest, could you please cc me at [EMAIL PROTECTED] with any replies. I really appreciate it. I have a few questions regarding input formats. Specifically, I want to use one complete text file per input

Re: Bugs in 0.16.0?

2008-03-03 Thread Amar Kamat
:05 PM, Amar Kamat wrote: 3) Lastly, it would seem beneficial for jobs that have significant startup overhead and memory requirements to not be run in separate JVMs for each task. Along these lines, it looks like someone submitted a patch for JVM-reuse a while back, but it wasn't commited? https

Re: Bugs in 0.16.0?

2008-03-03 Thread Amar Kamat
work correctly under scale changes, but *fixed* delays are almost never correct. Delays may work as a band-aid in the short run, but eventually you have to take the band-aid off. On 3/3/08 8:46 AM, Amar Kamat [EMAIL PROTECTED] wrote: HADOOP is not meant for real time applications. Its more or less

Re: Calculations involve large datasets

2008-02-22 Thread Amar Kamat
See http://incubator.apache.org/pig/. Hope that helps. Not sure how joins could be done in Hadoop. Amar On Fri, 22 Feb 2008, Chuck Lan wrote: Hi, I'm currently looking into how to better scale the performance of our calculations involving large sets of financial data. It is currently using a

Re: Questions about namenode and JobTracker configuration.

2008-02-21 Thread Amar Kamat
Zhang, jian wrote: Hi, All I have a small question about configuration. In Hadoop Documentation page, it says Typically you choose one machine in the cluster to act as the NameNode and one machine as to act as the JobTracker, exclusively. The rest of the machines act as both a

Re: how to set the result of the first mapreduce program as the input of the second mapreduce program?

2008-02-21 Thread Amar Kamat
Output of every mapreduce job in Hadoop gets stored in the DFS i.e made visible. You can run back to back jobs (i.e job chaining) but the output wont be temporary. Look at Grep.java as Hairong suggested for more details on job chaining. As of now there is no support for job chaining in Hadoop.

Re: Caching frequently map input files

2008-02-10 Thread Amar Kamat
Hi, I totally missed what you wanted to convey. What you want is that the maps(the tasks) should be able to share their caches across jobs. In hadoop each task is separate JVM. So sharing caches across tasks is sharing across JVM's and that too over time (i.e to make cache a separate higher

Re: Possible memory leak in MapTask$MapOutputBuffer

2008-02-05 Thread Amar Kamat
keyValBuffer = null; +comparator.clearBuffer(); } //A compare method that references the keyValBuffer through the indirect //pointers - Original Message From: Amar Kamat [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Tuesday, February 5, 2008 12:08:48 AM Subject: Re

Re: Possible memory leak in MapTask$MapOutputBuffer

2008-02-04 Thread Amar Kamat
Hi, Yes, you are correct. The reference to the old keyval buffers are still there even after the buffers are re-initialized but the reference is there just between the consecutive spills. The scenario before HADOOP-1965 was that the memory used for one sort-spill phase is io.sort.mb causing

Re: how to recover if master node goes down?

2008-02-03 Thread Amar Kamat
Ben Kucinich wrote: I am new to Hadoop. I want to know a few things. I have a Hadoop cluster of 1 master node and N - 1 slave nodes. I am putting files into the DFS. If one of the slave node goes down, the data is still accessible due to proper replication. There are 2 masters in hadoop,