Re: Using addCacheArchive

2009-06-25 Thread Amareshwari Sriramadasu
statement but still getting the same error: DistributedCache.addCacheArchive(new URI("/home/akhil1988/Config.zip#Config"), conf); Do you think whether there should be any problem in distributing a zipped directory and then hadoop unzipping it recursively. Thanks! Akhil Amareshwa

Re: Using addCacheArchive

2009-06-25 Thread Amareshwari Sriramadasu
Hi Akhil, DistributedCache.addCacheArchive takes path on hdfs. From your code, it looks like you are passing local path. Also, if you want to create symlink, you should pass URI as hdfs://#, besides calling DistributedCache.createSymlink(conf); Thanks Amareshwari akhil1988 wrote: Please a

Re: Unable to run Jar file in Hadoop.

2009-06-25 Thread Amareshwari Sriramadasu
Is your jar file in local file system or hdfs? The jar file should be in local fs. Thanks Amareshwari Shravan Mahankali wrote: Am as well having similar... there is no solution yet!!! Thank You, Shravan Kumar. M Catalytic Software Ltd. [SEI-CMMI Level 5 Company] -

Re: where is the "addDependingJob"?

2009-06-24 Thread Amareshwari Sriramadasu
one job ran after the other job in one class with the new api? Amareshwari Sriramadasu wrote: HRoger wrote: Hi As you know in the "org.apache.hadoop.mapred.jobcontrol.Job" there is a method called "addDependingJob" but not in "org.apache.hadoop.mapreduce.Job&qu

Re: where is the "addDependingJob"?

2009-06-24 Thread Amareshwari Sriramadasu
HRoger wrote: Hi As you know in the "org.apache.hadoop.mapred.jobcontrol.Job" there is a method called "addDependingJob" but not in "org.apache.hadoop.mapreduce.Job".Is there some method works like addDependingJob in "mapreduce" package? "org.apache.hadoop.mapred.jobcontrol.Job" is moved to

Re: external jars in .20

2009-06-01 Thread Amareshwari Sriramadasu
Hi Lance, Where are you passing the -libjars parameter? It is now GenericOption. It is no more a parameter for jar command. Thanks Amareshwari Lance Riedel wrote: We are trying to upgrade to .20 from 19.1 due to several issues we are having. Now are jobs are failing with class not found exc

Re: JobInProgress and TaskInProgress

2009-05-18 Thread Amareshwari Sriramadasu
You can use RunningJob handle to query map/reduce progress. See api @ http://hadoop.apache.org/core/docs/r0.20.0/api/org/apache/hadoop/mapred/RunningJob.html Thanks Amareshwari Jothi Padmanabhan wrote: Look at JobClient -- There are some useful methods there. For example, displayTasks and moni

Re: intermediate files of killed tasks not purged

2009-04-28 Thread Amareshwari Sriramadasu
. Regards Sandhya On Tue, Apr 28, 2009 at 2:02 PM, Amareshwari Sriramadasu wrote: Hi Sandhya, Which version of HADOOP are you using? There could be directories in mapred/local, pre 0.17. Now, there should not be any such directories. From version 0.17 onwards, the attempt directories will be

Re: intermediate files of killed tasks not purged

2009-04-28 Thread Amareshwari Sriramadasu
Hi Sandhya, Which version of HADOOP are you using? There could be directories in mapred/local, pre 0.17. Now, there should not be any such directories. From version 0.17 onwards, the attempt directories will be present only at mapred/local/taskTracker/jobCache// . If you are seeing the dire

Re: Hadoop streaming performance: elements vs. vectors

2009-04-05 Thread Amareshwari Sriramadasu
You can add your jar to distributed cache and add it to classpath by passing it in configuration propery - "mapred.job.classpath.archives". -Amareshwari Peter Skomoroch wrote: If I need to use a custom streaming combiner jar in Hadoop 18.3, is there a way to add it to the classpath without the

Re: job status from command prompt

2009-04-05 Thread Amareshwari Sriramadasu
Elia Mazzawi wrote: is there a command that i can run from the shell that says this job passed / failed I found these but they don't really say pass/fail they only say what is running and percent complete. this shows what is running ./hadoop job -list and this shows the completion ./hadoop

Re: reduce task failing after 24 hours waiting

2009-03-25 Thread Amareshwari Sriramadasu
Set mapred.jobtracker.retirejob.interval and mapred.userlog.retain.hours to higher value. By default, their values are 24 hours. These might be the reason for failure, though I'm not sure. Thanks Amareshwari Billy Pearson wrote: I am seeing on one of my long running jobs about 50-60 hours that

Re: Unable to access job details

2009-03-22 Thread Amareshwari Sriramadasu
Can you look for Exception from jetty in JT logs and report here? That would tell us the cause for ERROR 500. Thanks Amareshwari Nathan Marz wrote: Sometimes I am unable to access a job's details and instead only see. I am seeing this on 0.19.2 branch. HTTP ERROR: 500 Internal Server Error

Re: Task Side Effect files and copying(getWorkOutputPath)

2009-03-16 Thread Amareshwari Sriramadasu
Saptarshi Guha wrote: Hello, I would like to produce side effect files which will be later copied to the outputfolder. I am using FileOuputFormat, and in the Map's close() method i copy files (from the local tmp/ folder) to FileOutputFormat.getWorkOutputPath(job); FileOutputFormat.getWorkOut

Re: Reducers spawned when mapred.reduce.tasks=0

2009-03-15 Thread Amareshwari Sriramadasu
into future releases. cheers, ckw On Mar 12, 2009, at 8:20 PM, Amareshwari Sriramadasu wrote: Are you seeing reducers getting spawned from web ui? then, it is a bug. If not, there won't be reducers spawned, it could be job-setup/ job-cleanup task that is running on a reduce slot. See H

Re: Reducers spawned when mapred.reduce.tasks=0

2009-03-12 Thread Amareshwari Sriramadasu
Are you seeing reducers getting spawned from web ui? then, it is a bug. If not, there won't be reducers spawned, it could be job-setup/ job-cleanup task that is running on a reduce slot. See HADOOP-3150 and HADOOP-4261. -Amareshwari Chris K Wensel wrote: May have found the answer, waiting on

Re: streaming inputformat: class not found

2009-03-11 Thread Amareshwari Sriramadasu
Till 0.18.x, files are not added to client-side classpath. Use 0.19, and run following command to use custom input format bin/hadoop jar contrib/streaming/hadoop-0.19.0-streaming.jar -mapper mapper.pl -reducer org.apache.hadoop.mapred.lib.IdentityReducer -input test.data -output test-output -fi

Re: Jobs stalling forever

2009-03-10 Thread Amareshwari Sriramadasu
This is due to HADOOP-5233. Got fixed in branch 0.19.2 -Amareshwari Nathan Marz wrote: Every now and then, I have jobs that stall forever with one map task remaining. The last map task remaining says it is at "100%" and in the logs, it says it is in the process of committing. However, the task

Re: Throwing an IOException in Map, yet task does not fail

2009-03-05 Thread Amareshwari Sriramadasu
Is your job a streaming job? If so, Which version of hadoop are you using? what is the configured value for stream.non.zero.exit.is.failure? Can you see stream.non.zero.exit.is.failure to true and try again? Thanks Amareshwari Saptarshi Guha wrote: Hello, I have given a case where my mapper sh

Re: wordcount getting slower with more mappers and reducers?

2009-03-05 Thread Amareshwari Sriramadasu
Are you hitting HADOOP-2771? -Amareshwari Sandy wrote: Hello all, For the sake of benchmarking, I ran the standard hadoop wordcount example on an input file using 2, 4, and 8 mappers and reducers for my job. In other words, I do: time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 2

Re: binary format for streaming

2009-03-03 Thread Amareshwari Sriramadasu
[HADOOP-1722] Make streaming to handle non-utf8 byte array http://issues.apache.org/jira/browse/HADOOP-1722 is committed to branch 0.21 Yasuyuki Watanabe wrote: Hi, I would like to know the status of binary input/output format support for streaming. We found HADOOP-3227 and it was open. So we

Re: FAILED_UNCLEAN?

2009-02-25 Thread Amareshwari Sriramadasu
n parallel with this job, but it's of the same priority. The other job had failed when the job I'm describing got hung. On Feb 24, 2009, at 10:46 PM, Amareshwari Sriramadasu wrote: Nathan Marz wrote: I have a large job operating on over 2 TB of data, with about 5 input splits. For

Re: FAILED_UNCLEAN?

2009-02-24 Thread Amareshwari Sriramadasu
Nathan Marz wrote: I have a large job operating on over 2 TB of data, with about 5 input splits. For some reason (as yet unknown), tasks started failing on two of the machines (which got blacklisted). 13 mappers failed in total. Of those 13, 8 of the tasks were able to execute on another m

Re: Hadoop Streaming -file option

2009-02-24 Thread Amareshwari Sriramadasu
Arun C Murthy wrote: On Feb 23, 2009, at 2:01 AM, Bing TANG wrote: Hi, everyone, Could somdone tell me the principle of "-file" when using Hadoop Streaming. I want to ship a big file to Slaves, so how it works? Hadoop uses "SCP" to copy? How does Hadoop deal with -file option? No, -file ju

Re: How to use Hadoop API to submit job?

2009-02-20 Thread Amareshwari Sriramadasu
You should implement Tool interface and submit jobs. For example see org.apache.hadoop.examples.WordCount -Amareshwari Wu Wei wrote: Hi, I used to submit Hadoop job with the utility RunJar.main() on hadoop 0.18. On hadoop 0.19, because the commandLineConfig of JobClient was null, I got a Null

Re: Overriding mapred.tasktracker.map.tasks.maximum with -jobconf

2009-02-18 Thread Amareshwari Sriramadasu
Yes. The configuration is read only when the taskTracker starts. You can see more discussion on jira HADOOP-5170 (http://issues.apache.org/jira/browse/HADOOP-5170) for making it per job. -Amareshwari jason hadoop wrote: I certainly hope it changes but I am unaware that it is in the todo queue a

Re: Persistent completed jobs status not showing in jobtracker UI

2009-02-18 Thread Amareshwari Sriramadasu
Bill Au wrote: I have enabled persistent completed jobs status and can see them in HDFS. However, they are not listed in the jobtracker's UI after the jobtracker is restarted. I thought that jobtracker will automatically look in HDFS if it does not find a job in its memory cache. What am I miss

Re: Testing with Distributed Cache

2009-02-10 Thread Amareshwari Sriramadasu
Nathan Marz wrote: I have some unit tests which run MapReduce jobs and test the inputs/outputs in standalone mode. I recently started using DistributedCache in one of these jobs, but now my tests fail with errors such as: Caused by: java.io.IOException: Incomplete HDFS URI, no host: hdfs:///

Re: only one reducer running in a hadoop cluster

2009-02-08 Thread Amareshwari Sriramadasu
Nick Cen wrote: Hi, I hava a hadoop cluster with 4 pc. And I wanna to integrate hadoop and lucene together, so i copy some of the source code from nutch's Indexer class, but when i run my job, i found that there is only 1 reducer running on 1 pc, so the performance is not as far as expect.

Re: Task tracker archive contains too many files

2009-02-04 Thread Amareshwari Sriramadasu
Andrew wrote: I've noticed that task tracker moves all unpacked jars into ${hadoop.tmp.dir}/mapred/local/taskTracker. We are using a lot of external libraries, that are deployed via "-libjars" option. The total number of files after unpacking is about 20 thousands. After running a number of

Re: Hadoop Streaming Semantics

2009-02-02 Thread Amareshwari Sriramadasu
putFormat use LineRecordReader.) -Amareshwari Any thoughts? John On Sun, Feb 1, 2009 at 11:00 PM, Amareshwari Sriramadasu < amar...@yahoo-inc.com> wrote: Which version of hadoop are you using? You can directly use -inputformat org.apache.hadoop.mapred.lib.NLineInputFormat for your st

Re: Hadoop Streaming Semantics

2009-02-01 Thread Amareshwari Sriramadasu
roach, can you point me to an example of what kind of param should be specified? I appreciate your help. Thanks, SD On Thu, Jan 29, 2009 at 10:49 PM, Amareshwari Sriramadasu < amar...@yahoo-inc.com> wrote: You can use NLineInputFormat for this, which splits one line (N=1, by default) a

Re: [ANNOUNCE] Hadoop release 0.18.3 available

2009-01-30 Thread Amareshwari Sriramadasu
Anum Ali wrote: Hi, Need some kind of guidance related to started with Hadoop Installation and system setup. Iam newbie regarding to Hadoop . Our system OS is Fedora 8, should I start from a stable release of Hadoop or get it from svn developing version (from contribute site). Thank You

Re: Counters in Hadoop

2009-01-29 Thread Amareshwari Sriramadasu
Kris Jirapinyo wrote: Hi all, I am using counters in Hadoop via the reporter. I can see this custom counter fine after I run my job. However, if somehow I restart the cluster, then when I look into the Hadoop Job History, I can't seem to find the information of my previous counter values an

Re: Hadoop Streaming Semantics

2009-01-29 Thread Amareshwari Sriramadasu
You can use NLineInputFormat for this, which splits one line (N=1, by default) as one split. So, each map task processes one line. See http://hadoop.apache.org/core/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html -Amareshwari S D wrote: Hello, I have a clarifying question

Re: Interrupting JobClient.runJob

2009-01-27 Thread Amareshwari Sriramadasu
Edwin wrote: Hi I am looking for a way to interrupt a thread that entered JobClient.runJob(). The runJob() method keep polling the JobTracker until the job is completed. After reading the source code, I know that the InterruptException is caught in runJob(). Thus, I can't interrupt it using Thre

Re: Debugging in Hadoop

2009-01-26 Thread Amareshwari Sriramadasu
patektek wrote: Hello list, I am trying to add some functionality to Hadoop-core and I am having serious issues debugging it. I have searched in the list archive and still have not been able to resolve the issues. Simple question: If I want to insert "LOG.INFO()" statements in Hadoop code is not

Re: NLineInputFormat and very high number of maptasks

2009-01-20 Thread Amareshwari Sriramadasu
Saptarshi Guha wrote: Sorry, i see - every line is now a maptask - one split,one task.(in this case N=1 line per split) Is that correct? Saptarshi You are right. NLineInputFormat splits N lines of input as one split and each split is given to a map task. By default, N is 1. N can configured th

Re: How to debug a MapReduce application

2009-01-18 Thread Amareshwari Sriramadasu
From the exception you pasted, it looks like your io.serializations did not set the SerializationFactory properly. Do you see any logs on your console for adding serialization class? Can you try running your app on pseudo distributed mode, instead of LocalJobRunner ? You can find pseudo distribu

Re: Calling a mapreduce job from inside another

2009-01-18 Thread Amareshwari Sriramadasu
You can use Job Control. See http://hadoop.apache.org/core/docs/r0.19.0/mapred_tutorial.html#Job+Control http://hadoop.apache.org/core/docs/r0.19.0/api/org/apache/hadoop/mapred/jobcontrol/Job.html and http://hadoop.apache.org/core/docs/r0.19.0/api/org/apache/hadoop/mapred/jobcontrol/JobControl.htm

Re: streaming question.

2009-01-18 Thread Amareshwari Sriramadasu
You can also have a look at NLineInputFormat. @http://hadoop.apache.org/core/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html Thanks Amareshwari Abdul Qadeer wrote: Dmitry, If you are talking about Text data, then the splits can be anywhere. But LineRecordReader will take c

Re: hadoop job -history

2009-01-15 Thread Amareshwari Sriramadasu
is the location specified by the configuration property "hadoop.job.history.user.location". If you don't specify anything for the property, the job history logs will be created in job's output directory. So, to view your history give your jobOutputDir, if you havent specified any location. Hop

Re: Problem loading hadoop-site.xml - dumping parameters

2008-12-29 Thread Amareshwari Sriramadasu
Saptarshi Guha wrote: Hello, I had previously emailed regarding heap size issue and have discovered that the hadoop-site.xml is not loading completely, i.e Configuration defaults = new Configuration(); JobConf jobConf = new JobConf(defaults, XYZ.class); System.out.println("1:"+jo

Re: Does anyone have a working example for using MapFiles on the DistributedCache?

2008-12-28 Thread Amareshwari Sriramadasu
Sean Shanny wrote: To all, Version: hadoop-0.17.2.1-core.jar I have created a MapFile. What I don't seem to be able to do is correctly place the MapFile in the DistributedCache and the make use of it in a map method. I need the following info please: 1.How and where to place the MapFi

Re: OutofMemory Error, inspite of large amounts provided

2008-12-28 Thread Amareshwari Sriramadasu
Saptarshi Guha wrote: Caught it in action. Running ps -e -o 'vsz pid ruser args' |sort -nr|head -5 on a machine where the map task was running 04812 16962 sguha/home/godhuli/custom/jdk1.6.0_11/jre/bin/java -Djava.library.path=/home/godhuli/custom/hadoop/bin/../lib/native/Linux-amd64-64:/home

Re: Reduce not completing

2008-12-23 Thread Amareshwari Sriramadasu
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) 2008-12-23 19:04:57,781 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200812221742_0075_r_00_2' from 'tracker_hnode1.cor.mystrands.in:localhost/127.0.0.1:37971' Thanks, RDH On Dec 23, 2008, at 1:

Re: Reduce not completing

2008-12-23 Thread Amareshwari Sriramadasu
You can report status from streaming job by emitting reporter:status: in stderr. See documentation @ http://hadoop.apache.org/core/docs/r0.18.2/streaming.html#How+do+I+update+status+in+streaming+applications%3F But from the exception trace, it doesn't look like lack of report(timeout). The tr

Re: Failed to start TaskTracker server

2008-12-22 Thread Amareshwari Sriramadasu
You can set the configuration property "mapred.task.tracker.http.address" to 0.0.0.0:0 . If the port is given as 0, then the server will start on a free port. Thanks Amareshwari Sagar Naik wrote: - check hadoop-default.xml in here u will find all the ports used. Copy the xml-nodes from hado

Re: Reducing Hadoop Logs

2008-12-09 Thread Amareshwari Sriramadasu
Arv Mistry wrote: I'm using hadoop 0.17.0. Unfortunately I cant upgrade to 0.19.0 just yet. I'm trying to control the amount of extraneous files. I noticed there are the following log files produced by hadoop; On Slave - userlogs (for each map/reduce job)

Re: Optimized way

2008-12-04 Thread Amareshwari Sriramadasu
Hi Aayush, Do you want one map to run one command? You can give input file consisting of lines of . Use NLineInputFormat which splits N lines of input as one split. i.e gives N lines to one map for processing. By default, N is one. Then your map can just run the shell command on input line. W

Re: Error with Sequence File in hadoop-18

2008-11-27 Thread Amareshwari Sriramadasu
Message- From: Amareshwari Sriramadasu [mailto:[EMAIL PROTECTED] Sent: Friday, November 28, 2008 10:56 AM To: core-user@hadoop.apache.org Subject: Re: Error with Sequence File in hadoop-18 It got fixed in 0.18.3 (HADOOP-4499). -Amareshwari Palleti, Pallavi wrote: Hi, I am getting "Chec

Re: Error with Sequence File in hadoop-18

2008-11-27 Thread Amareshwari Sriramadasu
It got fixed in 0.18.3 (HADOOP-4499). -Amareshwari Palleti, Pallavi wrote: Hi, I am getting "Check sum ok was sent" errors when I am using hadoop. Can someone please let me know why this error is coming and how to avoid it. It was running perfectly fine when I used hadoop-17. And, this error

Re: how can I decommission nodes on-the-fly?

2008-11-25 Thread Amareshwari Sriramadasu
Jeremy Chow wrote: Hi list, I added a property dfs.hosts.exclude to my conf/hadoop-site.xml. Then refreshed my cluster with command bin/hadoop dfsadmin -refreshNodes It showed that it can only shut down the DataNode process but not included the TaskTracker process on each s

Re: Newbie: error=24, Too many open files

2008-11-23 Thread Amareshwari Sriramadasu
tim robertson wrote: Hi all, I am running MR which is scanning 130M records and then trying to group them into around 64,000 files. The Map does the grouping of the record by determining the key, and then I use a MultipleTextOutputFormat to write the file based on the key: @Override

Re: NLine Input Format

2008-11-19 Thread Amareshwari Sriramadasu
returns the value as N Lines? Thanks Rahul On Mon, Nov 17, 2008 at 9:43 AM, Amareshwari Sriramadasu <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: Hi Rahul, How did you set the configuration "mapred.line.input.format.linespermap" and your input forma

Re: NLine Input Format

2008-11-19 Thread Amareshwari Sriramadasu
returns the value as N Lines? Setting Configuration in run() method will also work. You have to extend LineRecordReader and override method next() to return N lines as value instead of 1 line. Thanks Amareshwari Thanks Rahul On Mon, Nov 17, 2008 at 9:43 AM, Amareshwari Sriramadasu <[EM

Re: NLine Input Format

2008-11-16 Thread Amareshwari Sriramadasu
Hi Rahul, How did you set the configuration "mapred.line.input.format.linespermap" and your input format? You have to set them in hadoop-site.xml or pass them through -D option to the job. NLineInputFormat will split N lines of input as one split. So, each map gets N lines. But the RecordReade

Re: distributed cache

2008-11-11 Thread Amareshwari Sriramadasu
Jeremy Pinkham wrote: We are using the distributed cache in one of our jobs and have noticed that the local copies on all of the task nodes never seem to get cleaned up. Is there a mechanism in the API to tell the framework that those copies are no longer needed so they can be deleted. I've tri

Re: reading input for a map function from 2 different files?

2008-11-09 Thread Amareshwari Sriramadasu
some speed wrote: I was wondering if it was possible to read the input for a map function from 2 different files: 1st file ---> user-input file from a particular location(path) 2nd file=---> A resultant file (has just one pair) from a previous MapReduce job. (I am implementing a chain MapReduce

Re: map/reduce driver as daemon

2008-11-05 Thread Amareshwari Sriramadasu
shahab mehmandoust wrote: I'm try to write a daemon that periodically wakes up and runs map/reduce jobs, but I've have little luck. I've tried different ways (including using cascading) and I keep arriving at the below exception: java.lang.OutOfMemoryError: Java heap space at org.apache.had

Re: _temporary directories not deleted

2008-11-04 Thread Amareshwari Sriramadasu
Nathan Marz wrote: Hello all, Occasionally when running jobs, Hadoop fails to clean up the "_temporary" directories it has left behind. This only appears to happen when a task is killed (aka a speculative execution), and the data that task has outputted so far is not cleaned up. Is this a kn

Re: Debugging / Logging in Hadoop?

2008-10-30 Thread Amareshwari Sriramadasu
Some more links: http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Other+Useful+Features http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Debugging -Amareshwari Arun C Murthy wrote: On Oct 30, 2008, at 1:16 PM, Scott Whitecross wrote: Is the presentation online as

Re: How do I include customized InputFormat, InputSplit and RecordReader in a C++ pipes job?

2008-10-29 Thread Amareshwari Sriramadasu
hem as jar file, is there any other ways to do that? Thanks Mike From: Amareshwari Sriramadasu <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Sent: Tuesday, October 28, 2008 11:58:33 PM Subject: Re: How do I include customized InputFormat, InputSp

Re: How do I include customized InputFormat, InputSplit and RecordReader in a C++ pipes job?

2008-10-28 Thread Amareshwari Sriramadasu
Hi, How are you passing your classes to the pipes job? If you are passing them as a jar file, you can use -libjars option. From branch 0.19, the libjar files are added to the client classpath also. Thanks Amareshwari Zhengguo 'Mike' SUN wrote: Hi, I implemented customized classes for InputF

Re: Problems running the Hadoop Quickstart

2008-10-20 Thread Amareshwari Sriramadasu
Has your task-tracker started? I mean, do you see non-zero nodes on your job tracker UI? -Amareshwari John Babilon wrote: Hello, I've been trying to get Hadoop up and running on a Windows Desktop running Windows XP. I've installed Cygwin and Hadoop. I run the start-all.sh script, it starts

Re: Add jar file via -libjars - giving errors

2008-10-06 Thread Amareshwari Sriramadasu
Hi, From 0.19, the jars added using -libjars are available on the client classpath also, fixed by HADOOP-3570. Thanks Amareshwari Mahadev Konar wrote: HI Tarandeep, the libjars options does not add the jar on the client side. Their is an open jira for that ( id ont remember which one)... O

Re: Using different file systems for Map Reduce job input and output

2008-10-06 Thread Amareshwari Sriramadasu
Hi Naama, Yes. It is possible to specify using the apis FileInputFormat#setInputPaths(), FileOutputFormat#setOutputPath(). You can specify the FileSystem uri for the path. Thanks, Amareshwari Naama Kraus wrote: Hi, I wanted to know if it is possible to use different file systems for Map Re

Re: streaming silently failing when executing binaries with unresolved dependencies

2008-10-02 Thread Amareshwari Sriramadasu
This is because the non-zero exit status of streaming process was not treated as failure until 0.17. In 0.17, you can specify the configuration property "stream.non.zero.exit.is.failure" as "true", to consider the non-zero exit as failure. From 0.18, the default value for/ stream.non.zero.exit

Re: LZO and native hadoop libraries

2008-09-30 Thread Amareshwari Sriramadasu
Are you seeing HADOOP-2009? Thanks Amareshwari Nathan Marz wrote: Unfortunately, setting those environment variables did not help my issue. It appears that the "HADOOP_LZO_LIBRARY" variable is not defined in both LzoCompressor.c and LzoDecompressor.c. Where is this variable supposed to be set?

Re: streaming question

2008-09-16 Thread Amareshwari Sriramadasu
mlink in the local running directory, correct? Just like the cacheFile option? If not how can i then specify which class to use? cheers, Christian Amareshwari Sriramadasu wrote: Dennis Kubes wrote: If I understand what you are asking you can use the -cacheArchive with the path to the j

Re: streaming question

2008-09-14 Thread Amareshwari Sriramadasu
Dennis Kubes wrote: If I understand what you are asking you can use the -cacheArchive with the path to the jar to including the jar file in the classpath of your streaming job. Dennis You can also use -cacheArchive option to include jar file and symlink the unjarred directory from cwd by pro

Re: Logging best practices?

2008-09-08 Thread Amareshwari Sriramadasu
Per Jacobsson wrote: Hi all. I've got a beginner question: Are there any best practices for how to do logging from a task? Essentially I want to log warning messages under certain conditions in my map and reduce tasks, and be able to review them later. stdout, stderr and the logs using common

Re: input files

2008-08-20 Thread Amareshwari Sriramadasu
You can add more paths to input using FileInputFormat.addInputPath(JobConf, Path). You can also specify comma separated filenames as input path using FileInputFormat.setInputPaths(JobConf, String commaSeparatedPaths) More details at http://hadoop.apache.org/core/docs/current/api/org/apache/hadoo

Re: help,error "...failed to report status for xxx seconds..."

2008-08-03 Thread Amareshwari Sriramadasu
The Mapred framework kills the map/reduce tasks if they dont report status within 10 minutes. If your mapper/reducer needs more time they should report status using http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/Reporter.html More documentation at http://hadoop.apache.org/c

Re: mapper input file name

2008-08-03 Thread Amareshwari Sriramadasu
You can get the file name accessed by the mapper using the config property "map.input.file" Thanks Amareshwari Deyaa Adranale wrote: Hi, I need to know inside my mapper, the name of the file that contains the current record. I saw that I can access the name of the input directories inside ma

Re: Could not find any valid local directory for task

2008-08-03 Thread Amareshwari Sriramadasu
The error "Could not find any valid local directory for task" means that the task could not find a local directory to write file, mostly because there is no enough space on any of the disks. Thanks Amareshwari Shirley Cohen wrote: Hi, Does anyone know what the following error means? hadoop-

Re: Running mapred job from remote machine to a pseudo-distributed hadoop

2008-08-03 Thread Amareshwari Sriramadasu
Arv Mistry wrote: I'll try again, can anyone tell me should it be possible to run hadoop in a pseudo-distributed mode (i.e. everything on one machine) and then submit a mapred job using the ToolRunner from another machine on that hadoop configuration? Cheers Arv Yes. It is possible to do.

Re: Where can i download hadoop-0.17.1-examples.jar

2008-07-30 Thread Amareshwari Sriramadasu
Hi Srilatha, You can download hadoop release tar ball from http://hadoop.apache.org/core/releases.html You will find hadoop-*-examples.jar when you untar it. Thanks, Amareshwari us latha wrote: HI All, Trying to run the wordcount example on single node hadoop setup. Could anyone please poin

Re: JobTracker History data+analysis

2008-07-28 Thread Amareshwari Sriramadasu
same, https://issues.apache.org/jira/browse/HADOOP-3850. You can give you inputs there. Thanks Amareshwari Paco On Mon, Jul 28, 2008 at 1:42 AM, Amareshwari Sriramadasu <[EMAIL PROTECTED]> wrote: HistoryViewer is used in JobClient to view the history files in the directory provi

Re: JobTracker History data+analysis

2008-07-27 Thread Amareshwari Sriramadasu
the directory. Thanks Amareshwari Paco NATHAN wrote: Thank you, Amareshwari - That helps. Hadn't noticed HistoryViewer before. It has no JavaDoc. What is a typical usage? In other words, what would be the "outputDir" value in the context of ToolRunner, JobClient, etc. ? Pa

Re: JobTracker History data+analysis

2008-07-27 Thread Amareshwari Sriramadasu
Can you have a look at org.apache.hadoop.mapred.HistoryViewer and see if it make sense? Thanks Amareshwari Paco NATHAN wrote: We have a need to access data found in the JobTracker History link. Specifically in the "Analyse This Job" analysis. Must be run in Java, between jobs, in the same code

Re: Tasktrackers job cache directories not always cleaned up

2008-07-09 Thread Amareshwari Sriramadasu
The proposal on http://issues.apache.org/jira/browse/HADOOP-3386 takes care of this. Thanks Amareshwari Amareshwari Sriramadasu wrote: If task tracker didn't receive KillJobAction, its true that job directory will not removed. And your observation is correct that some task trackers d

Re: Tasktrackers job cache directories not always cleaned up

2008-07-02 Thread Amareshwari Sriramadasu
If task tracker didn't receive KillJobAction, its true that job directory will not removed. And your observation is correct that some task trackers didn't receive KillJobAction for the job. If a reduce task has finished before the job completion, the task will be sent KillTaskAction. Looks like

Re: Too many Task Manager children...

2008-06-19 Thread Amareshwari Sriramadasu
C G wrote: Hi All: I have mapred.tasktracker.tasks.maximum set to 4 in our conf/hadoop-site.xml, yet I frequently see 5-6 instances of org.apache.hadoop.mapred.TaskTracker$Child running on the slave nodes. Is there another setting I need to tweak in order to dial back the number of childr

Re: Why is there a seperate map and reduce task capacity?

2008-06-16 Thread Amareshwari Sriramadasu
Taeho Kang wrote: Set "mapred.tasktracker.tasks.maximum" and each node will be able to process N number of tasks - map or/and reduce. Please note that once you set "mapred.tasktracker.tasks.maximum", "mapred.tasktracker.map.tasks.maximum" and "mapred.tasktracker.reduce.tasks.maximum" setting wil

Re: External Jar

2008-05-29 Thread Amareshwari Sriramadasu
You can put your external jar in DistributedCache. and do symlink the jar in the current working directory of the task giving the value of mapred.create.symlink as true. More details can be found at http://issues.apache.org/jira/browse/HADOOP-1660. The jar can also be added to classpath usin

Re: Newbie InputFormat Question

2008-05-08 Thread Amareshwari Sriramadasu
You can have a look at TextInputFormat, KeyValueTextInputFormat etc at http://svn.apache.org/viewvc/hadoop/core/trunk/src/java/org/apache/hadoop/mapred/ coneybeare wrote: I want to alter the default <"key", "line"> input format to be <"key", "line number:" + "line"> so that my mapper can have

Re: Question on how to view the counters of jobs in the job tracker history

2008-04-07 Thread Amareshwari Sriramadasu
Arun C Murthy wrote: On Apr 3, 2008, at 5:36 PM, Jason Venner wrote: For the first day or so, when the jobs are viewable via the main page of the job tracker web interface, the jobs specific counters are also visible. Once the job is only visible in the history page, the counters are not vis

Re: Hadoop streaming performance problem

2008-03-31 Thread Amareshwari Sriramadasu
LineRecordReader.readLine() is deprecated by HADOOP-2285(http://issues.apache.org/jira/browse/HADOOP-2285) because it was slow. But streaming still uses the method. HADOOP-2826 (http://issues.apache.org/jira/browse/HADOOP-2826) will remove the usage in streaming. This change should improve str

Re: Hadoop streaming cacheArchive

2008-03-20 Thread Amareshwari Sriramadasu
Norbert Burger wrote: I'm trying to use the cacheArchive command-line options with the hadoop-0.15.3-streaming.jar. I'm using the option as follows: -cacheArchive hdfs://host:50001/user/root/lib.jar#lib Unfortunately, my PERL scripts fail with an error consistent with not being able to find th

Re: streaming problem

2008-03-18 Thread Amareshwari Sriramadasu
Hi Andreas, Looks like your mapper is not available to the streaming jar. Where is your mapper script? Did you use distributed cache to distribute the mapper? You can use -file to make it part of jar. or Use -cacheFile /dist/wordloadmf#workloadmf to distribute the script. Distributing this way

Re: [Fwd: Re: runtime exceptions not killing job]

2008-03-18 Thread Amareshwari Sriramadasu
Thanks Matt for info. I raised a Jira for this at https://issues.apache.org/jira/browse/HADOOP-3039 Thanks Amareshwari Matt Kent wrote: Or maybe I can't use attachments, so here's the stack traces inline: --task tracker 2008-03-17 21:58:30

Re: Hadoop streaming question

2008-03-11 Thread Amareshwari Sriramadasu
Hi Andrey, I think that is classpath problem. Can you try using patch at https://issues.apache.org/jira/browse/HADOOP-2622 and see you still have the problem? Thanks Amareshwari. Andrey Pankov wrote: Hi all, I'm still new to Hadoop. I'd like to use Hadoop streaming in order to combine map