Re: Could not find any valid local directory for taskTracker

2009-11-11 Thread Amareshwari Sri Ramadasu
This would happen when there is not enough space any of the local directories. -Amareshwari On 11/11/09 11:03 PM, "Saju K K" wrote: Hi, Did you get a solution for this problem ,we are facing a similar problem saju Pallavi Palleti wrote: > > Hi, > I got below error while running my hadoop

Re: chaining jobs

2009-12-13 Thread Amareshwari Sri Ramadasu
You can use the utility JobControl for doing so. More info @ http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html Thanks Amareshwri On 12/12/09 12:14 AM, "Mike Kendall" wrote: make a runner that has a bunch of hadoop jobs in one bash file... that'

Re: File Split

2009-12-21 Thread Amareshwari Sri Ramadasu
You should implement your split to represent the split information. Then you should implement getSplits in InputFormat to get the splits from your input, which divides the whole input into chunks. Here, each split will be given to a map task. You should also define RecordReader which reads recor

Re: File Split

2009-12-21 Thread Amareshwari Sri Ramadasu
e split will be consisted by an array. Where and how this should be defined in InputFormat? Many thanks. In your InputFormat, you should define getSplits() method which returns your ImageSplits. Thanks Amareshwari On Mon, Dec 21, 2009 at 6:37 AM, Amareshwari Sri Ramadasu < amar...@yahoo-inc.com>

Re: How to reuse the nodes in blacklist ?

2010-01-05 Thread Amareshwari Sri Ramadasu
Restarting the trackers makes them un-blacklisted. -Amareshwari On 1/5/10 2:27 PM, "Jeff Zhang" wrote: Hi all, Two of my nodes are in the blacklist, and I want to reuse them again. How can I do that ? Thank you. Jeff Zhang

Re: Multiple file output

2010-01-05 Thread Amareshwari Sri Ramadasu
In branch 0.21, You can get the functionality of both org.apache.hadoop.mapred.lib.MultipleOutputs and org.apache.hadop.mapred.lib.MultipleOutputFormat in org.apache.hadoop.mapreduce.lib.output.MultipleOutputs. Please see MAPREDUCE-370 for more details. Thanks Amareshwari On 1/5/10 5:56 PM, "

Re: Multiple file output

2010-01-06 Thread Amareshwari Sri Ramadasu
to be part of 0.20.2 or later? 2010/1/5 Amareshwari Sri Ramadasu > In branch 0.21, You can get the functionality of both > org.apache.hadoop.mapred.lib.MultipleOutputs and > org.apache.hadop.mapred.lib.MultipleOutputFormat in > org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.

Re: Which config parameters are node-specific?

2010-01-19 Thread Amareshwari Sri Ramadasu
Hi Zhang, The following parameters are node specific. mapred.tasktracker.map.tasks.maximum mapred.tasktracker.reduce.tasks.maximum tasktracker.http.threads dfs.datanode.handler.count The rest of the parameters are Job-specific. Thanks Amareshwari On 1/20/10 6:01 AM, "Zhang, Zhang"

Re: Is it possible to write each key-value pair emitted by the reducer to a different output file

2010-02-04 Thread Amareshwari Sri Ramadasu
See MultipleOutputs at http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html -Amareshwari On 2/5/10 10:41 AM, "Udaya Lakshmi" wrote: Hi, I was wondering if it is possible to write each key-value pair produced by the reduce function to a different

Re: has anyone ported hadoop.lib.aggregate?

2010-02-07 Thread Amareshwari Sri Ramadasu
Org.apache.hadoop.mapred.lib.aggregate has been ported to new api in branch 0.21. See http://issues.apache.org/jira/browse/MAPREDUCE-358 Thanks Amareshwari On 2/7/10 5:34 AM, "Meng Mao" wrote: >From what I can tell, while the ValueAggregator stuff should be useable, the ValueAggregatorJob and

Re: ChainMapper/ChainReducer without using deprecated classes?

2010-02-14 Thread Amareshwari Sri Ramadasu
ChainMapper/ChainReducer have been ported to use new api through http://issues.apache.org/jira/browse/MAPREDUCE-372. But it is not available in 0.20. You can try to apply the patch yourself from the jira. Thanks Amareshwari On 2/12/10 8:39 PM, "Dean Jones" wrote: Hello folks, I'm looking at

Re: A unique number for every task (map/reduce)

2010-02-14 Thread Amareshwari Sri Ramadasu
You can get task attempt id, tipid, job id though the configuration properties mapred.task.id, mapred.tip.id and mapred.job.id respectively. For e.g. the attempt attempt_201002092101_0063_m_00_0 : attemptid - attempt_201002092101_0063_m_00_0; tipid - task_201002092101_0063_m_00; jo

Re: hadoop-streaming tutorial with -archives option

2010-02-21 Thread Amareshwari Sri Ramadasu
Hi Michael, There is bug with passing symlink name for -files and -archives options . See MAPREDUCE-787. If you don't pass any symlink name for the uri in -files and -archives, it creates a symlink with actual name. So, if you pass -archives "hdfs://localhost:9000/user/me/samples/cachefile/cach

Re: MultipleOutputs and new 20.1 API

2010-02-23 Thread Amareshwari Sri Ramadasu
MultipleOutputs is ported to use new api through http://issues.apache.org/jira/browse/MAPREDUCE-370. This change is done in branch 0.21. Thanks Amareshwari On 2/23/10 5:42 PM, "Chris White" wrote: Does the code for MutipleOutputs work (as described in the Javadocs) for the new 20.1 API? This

Re: TTL of distributed cache

2010-03-16 Thread Amareshwari Sri Ramadasu
Hi Gang, Answers inline. On 3/16/10 9:58 AM, "Gang Luo" wrote: Hi all, what is the life length of the distributed cache files? Localized cache file will be removed, if the file is not used by any job and localized disk space on the machine goes higher than configured local.cache.size(by defaul

Re: Hadoop streaming command : -file option to pass a directory to jobcache

2010-03-18 Thread Amareshwari Sri Ramadasu
You can archive/zip the directory and pass it. You might have to unarchive it yourself if you use -file option. You can use -archives option which will unarchive it for you. Please see http://hadoop.apache.org/common/docs/r0.20.0/commands_manual.html#Generic+Options for more details. -Amareshw

Re: hadoop.log.dir

2010-03-29 Thread Amareshwari Sri Ramadasu
Hadoop.log.dir is not config parameter, it is a system property. You can specify the log directory in the environment variable HADOOP_LOG_DIR. Thanks Amareshwari On 3/30/10 11:17 AM, "Vasilis Liaskovitis" wrote: Hi all, is there a config option that controls placement of all hadoop logs? I 'd

Re: log

2010-03-31 Thread Amareshwari Sri Ramadasu
Along with JobTracker maintaining history in ${hadoop.log.dir}/logs/history, in branch 0.20, the job history is available in a user location also. User location can be specified for configuration “hadoop.job.history.user.location”. By default, if nothing is specified for the configuration, the h

Re: -files flag question

2010-04-11 Thread Amareshwari Sri Ramadasu
Hi Keith Willey, -files option takes comma separated files (passed as URIs) to make them available on compute nodes for maps or reduces. For example, -files file:///myfiles/file1,file:///myfiles/file2,hdfs:/localhost:9000/files/dfsfile. You can also pass a symlink name in the uri's fragment.

Re: Pipes program with Java InputFormat/RecordReader

2010-04-14 Thread Amareshwari Sri Ramadasu
Hi Keith, My answers inline. On 4/15/10 12:57 AM, "Keith Wiley" wrote: How do I use a nondefault Java InputFormat/RecordReader with a Pipes program. I realize I can set: hadoop.pipes.java.recordreader true or alterntively "-D hadoop.pipes.java.recordreader=true" ...to get

Re: Distributed Cache with New API

2010-04-15 Thread Amareshwari Sri Ramadasu
Hi, @Ted, below code is internal code. Users are not expected to call DistributedCache.getLocalCache(), they cannot use it also. They do not know all the parameters. @Larry, DistributedCache is not changed to use new api in branch 0.20. The change is done in only from branch 0.21. See MAPREDUCE-

Re: o.a.h.mapreduce API and SequenceFile encoding format

2010-04-18 Thread Amareshwari Sri Ramadasu
SequenceFileOutputFormat id not ported to use the new org.apache.hadoop.mapreduce api. It is ported in branch 0.21 through MAPREDUCE-656. Thanks Amareshwari On 4/16/10 11:13 PM, "Bo Shi" wrote: Hey Folks, No luck on IRC; trying here: I was playing around with 0.20.x and SequenceFileOutputF

Re: Reducer ID

2010-04-26 Thread Amareshwari Sri Ramadasu
context.getTaskAttemptID() gives the task attempt id and context,getTaskAttemptID().getTaskID() gives the task id of the reducer. Context.getTaskAttemptID().getTaskID().getId() gives the reducer number. Thanks Amareshwari On 4/27/10 5:34 AM, "Gang Luo" wrote: JobConf.get("mapred.task.id") giv

Re: Logging from the job

2010-04-27 Thread Amareshwari Sri Ramadasu
Where are you looking for the logs? They will be available in Tasklogs. You can view them from web ui from taskdetails.jsp page. -Amareshwari On 4/27/10 2:22 PM, "Alexander Semenov" wrote: Hi all. I'm not sure if I'm posting to correct mail list, please, suggest the correct one if so. I need

Re: about CombineFileInputFormat

2010-05-04 Thread Amareshwari Sri Ramadasu
See patch on https://issues.apache.org/jira/browse/MAPREDUCE-364 as an example. -Amareshwari On 5/5/10 1:52 AM, "Zhenyu Zhong" wrote: Hi, I tried to use CombineFileInputFormat in 0.20.2. It seems I need to extend it because it is an abstract class. However, I need to implement getRecordReader

Re: How to add external jar file while running a hadoop program

2010-05-06 Thread Amareshwari Sri Ramadasu
You can pass jars using -libjars option. See http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Usage And http://hadoop.apache.org/common/docs/r0.20.0/commands_manual.html#Generic+Options -Amareshwari On 5/7/10 9:25 AM, "harshira" wrote: am new to hadoop. I have a file Wordco

Re: Task process exit with nonzero status of 1 - deleting userlogs helps

2010-06-16 Thread Amareshwari Sri Ramadasu
The issue is fixed in branch 0.21 through http://issues.apache.org/jira/browse/MAPREDUCE-927. Now, the attempt directories are moved inside job directory. So, userlogs directory will have only job directories. Thanks Amareshwari On 6/16/10 12:47 PM, "Johannes Zillmann" wrote: Hi Edward, i cop

Re: No KeyValueTextInputFormat in hadoop-0.20.2?

2010-06-20 Thread Amareshwari Sri Ramadasu
The new api KeyValueTextInputFormat is not available in branch 0.20. It is added in branch 0.21 through https://issues.apache.org/jira/browse/MAPREDUCE-655. Thanks Amareshwari On 6/21/10 6:52 AM, "Kevin Tse" wrote: Is there anybody knowing about this, please? On Mon, Jun 14, 2010 at 10:21 PM

Re: Dynamically set mapred.tasktracker.map.tasks.maximum from inside a job.

2010-06-30 Thread Amareshwari Sri Ramadasu
Hi Pierre, "mapred.tasktracker.map.tasks.maximum" is a cluster level configuration, cannot be set per job. It is loaded only while bringing up the TaskTracker. Thanks Amareshwari On 6/30/10 3:05 PM, "Pierre ANCELOT" wrote: Hi everyone :) There's something I'm probably doing wrong but I can't

Re: Hadoop Streaming

2010-07-14 Thread Amareshwari Sri Ramadasu
In streaming, the combined values are given to reducer as pairs again, so you don't see key and list of values. I think it is done in that way to be symmetrical with mapper, though I don't know exact reason. Thanks Amareshwari On 7/14/10 1:05 PM, "Moritz Krog" wrote: Hi everyone, I'm pretty

Re: Hadoop Streaming (with Python) and Queue's

2010-07-14 Thread Amareshwari Sri Ramadasu
-D options (which is a generic option) should be moved before the command specific options. The syntax is Bin/hadoop jar streaming.jar Thanks Amareshwari On 7/14/10 10:27 PM, "Moritz Krog" wrote: I second that observation, I c&p'ed most of the -D options directly from the tutorial and found

Re: StreamXmlRecordReader and gzip

2010-07-15 Thread Amareshwari Sri Ramadasu
There is related issue and discussion at https://issues.apache.org/jira/browse/MAPREDUCE-589. On 7/16/10 1:04 AM, "David Pellegrini" wrote: Hi All, I haven't seen this discussed in documentation or user forums, so I'm hoping someone here can provide some guidance. :-) I created a M/R job us

Re: Concurrent Mappers

2010-07-19 Thread Amareshwari Sri Ramadasu
There is no separate MapRunner in new api. You can override Mapper.run() method to achieve similar goal. On 7/20/10 5:09 AM, "Praveen Yarlagadda" wrote: Hello all, Is it possible to run mappers concurrently using the new API? Using old API, I used to do that by setting the following: jobConf.

Re: counter is not correct in new API

2010-08-08 Thread Amareshwari Sri Ramadasu
How are you accessing the counter? You should access it through the enum org.apache.hadoop.mapreduce.TaskCounter.REDUCE_OUTPUT_RECORDS -Amareshwari On 8/8/10 2:12 AM, "Gang Luo" wrote: Hi all, I am using new API and find that the reduce output record counter shows 0. Actually my reducers outpu

Re: MultipleOutputFormat

2010-08-16 Thread Amareshwari Sri Ramadasu
Try with number of reducers = 1 . -Amareshwari On 8/16/10 12:25 PM, "rajgopalv" wrote: 0 down vote favorite Hi. I'm a newbie in Hadoop. I'm trying out the Wordcount program. Now to try out multiple output files, i use MultipleOutputFormat. this link helped me in doing it. http://hadoop.

Re: Custom Input Format in New API (Convert Mahaout XMLInput Format to New API)

2010-08-24 Thread Amareshwari Sri Ramadasu
With new api, next method reads key and value into member variables, and the key and value read by next method can read by getCurrentKey() and getCurrentValue(). For example see LineRecoredReader, SequenceFileRecordReader in org.apache.hadoop.mapreduce.lib.input package. Thanks Amareshwari On

Re: multi-thread problem in map

2010-08-27 Thread Amareshwari Sri Ramadasu
You can have a look at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper in 0.21 or trunk. On 8/27/10 1:45 PM, "xiujin yang" wrote: Hi All, Under Hadoop 0.20.2, according to map introduction, If you want to use multi-threaded in Map, you can override the run method of Mapper. Is anyo

Re: mapreduce attempts killed

2010-08-27 Thread Amareshwari Sri Ramadasu
You should look at task logs to figure why the tasks failed. They are accessible from web ui and also on tasktracker nodes at ${hadoop.log.dir}/userlogs directory. If the task is KILLED, it is KILLED by framework because it was speculative attempt or the job got failed/killed. You can ignore the

Re: Classpath

2010-08-29 Thread Amareshwari Sri Ramadasu
You can use -libjars option. On 8/29/10 10:59 AM, "Mark" wrote: How can I add jars to Hadoops classpath when running MapReduce jobs for the following situations? 1) Assuming that the jars are local the nodes that running the job. 2) The jobs are only local to the client submitting the job. I

Re: Classpath

2010-08-30 Thread Amareshwari Sri Ramadasu
It works on the client side also if the files are on the local file system. On 8/30/10 9:15 PM, "Mark" wrote: On 8/30/10 7:38 AM, Mark wrote: > On 8/29/10 10:38 PM, Amareshwari Sri Ramadasu wrote: >> You can use -libjars option. >> >> >> On 8/29/10 10:59

Re: Hadoop Streaming?

2010-09-08 Thread Amareshwari Sri Ramadasu
Some documentation on Hadoop streaming and pipes: http://hadoop.apache.org/mapreduce/docs/r0.21.0/streaming.html http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapred/pipes/package-summary.html On 9/8/10 2:34 PM, "Rita Liu" wrote: Hi :) May I have two simple (and general)

Re: How to rebuild Hadoop ??

2010-09-08 Thread Amareshwari Sri Ramadasu
If the jar passed in -libjars is in local filesystem, it is added to client classpath also. On 9/8/10 3:57 PM, "Jeff Zhang" wrote: Matthew, InputFormat will been used in client side, so you should combine the hadoop-0.20.2.jar and hadoop-example.jar into one single jar On Wed, Sep 8, 2010 a

Re: can not report progress from reducer context with hadoop 0.21

2010-09-21 Thread Amareshwari Sri Ramadasu
This is a bug in 0.21. MAPREDUCE-1905 ( https://issues.apache.org/jira/browse/MAPREDUCE-1905) is open for this. On 9/21/10 4:29 PM, "Marc Sturlese" wrote: I am using hadoop 0.21 I have a reducer task wich takes more time to finish that the mapreduce.task.timeout so it's being killed: Task att

Re: MultipleInputs and org.apache.hadoop.mapred package in 0.20.2

2010-10-06 Thread Amareshwari Sri Ramadasu
For now, you can apply the patch available on MAPREDUCE-1905 and use 0.21 release. On 10/6/10 9:51 PM, "Marc Sturlese" wrote: I'm working with hadoop 0.20.2 using the new API contained in the package: org.apache.hadoop.mapreduce I have noticed that MultipleInputs is under: org.apache.hadoop.

Re: Hadoop starting extra map tasks and eventually failing

2010-10-15 Thread Amareshwari Sri Ramadasu
These extra tasks are job-setup and job-cleanup tasks which use map/reduce slots to run. Looks like job-setup task failed for your second job even after retries, so no maps are scheduled. But you should see tasklogs for the failed tasks. Thanks Amareshwari On 10/15/10 5:11 PM, "Murali Krishna.

Re: JobTracker API

2010-11-11 Thread Amareshwari Sri Ramadasu
You should use api from org.apache.hadoop.mapreduce.Cluster/org.apache.hadoop/mapreduce.Job in branch 0.21 and later, or api from org.apache.hadoop.mapred.JobClient in branch 0.20. On 11/12/10 11:54 AM, "Jaydeep Ayachit" wrote: The JobTracker API to get completed jobs completedJobs() returns

Re: JobTracker API

2010-11-11 Thread Amareshwari Sri Ramadasu
ds Jaydeep -Original Message----- From: Amareshwari Sri Ramadasu [mailto:amar...@yahoo-inc.com] Sent: Friday, November 12, 2010 12:01 PM To: common-user@hadoop.apache.org Subject: Re: JobTracker API You should use api from org.apache.hadoop.mapreduce.Cluster/org.apache.hadoop/mapreduce.Job in branch

Re: How to apply Patch

2011-03-30 Thread Amareshwari Sri Ramadasu
Adarsh, Your command should be : patch -p0 < fix-test-pipes.patch See http://wiki.apache.org/hadoop/HowToContribute for details on how to contribute. Thanks Amareshwari On 3/31/11 9:54 AM, "Adarsh Sharma" wrote: Thanks Harsh, I am trying the patch command but below error exists : [root@ws-t

Re: Hadoop Pipes Error

2011-03-30 Thread Amareshwari Sri Ramadasu
Here is an answer for your question in old mail archive: http://lucene.472066.n3.nabble.com/pipe-application-error-td650185.html On 3/31/11 10:15 AM, "Adarsh Sharma" wrote: Any update on the below error. Please guide. Thanks & best Regards, Adarsh Sharma Adarsh Sharma wrote: > Dear all, >

Re: Hadoop Pipes Error

2011-03-31 Thread Amareshwari Sri Ramadasu
you have nfs or something working across the cluster. Please need if I'm wrong. I need to run it with TextInputFormat. If posiible Please explain the above post more clearly. Thanks & best Regards, Adarsh Sharma Amareshwari Sri Ramadasu wrote: Here is an answer for your question in old mai

Re: Hadoop Pipes Error

2011-03-31 Thread Amareshwari Sri Ramadasu
Adarsh, The inputformat is present in test jar. So, pass -libjars to your command. libjars option should be passed before program specific options. So, it should be just after your -D parameters. -Amareshwari On 3/31/11 3:45 PM, "Adarsh Sharma" wrote: Amareshwari Sri Ramadasu

Re: Hadoop Pipes Error

2011-03-31 Thread Amareshwari Sri Ramadasu
parameters. -Amareshwari On 3/31/11 3:45 PM, "Adarsh Sharma" wrote: Amareshwari Sri Ramadasu wrote: Re: Hadoop Pipes Error You can not run it with TextInputFormat. You should run it with org.apache.hadoop.mapred.pipes .WordCountInputFormat. You can pass the input format by passin

Re: hadoop streaming and job conf settings

2011-04-13 Thread Amareshwari Sri Ramadasu
Looks like you are hitting https://issues.apache.org/jira/browse/MAPREDUCE-1621. -Amareshwari On 4/13/11 11:39 PM, "Shivani Rao" wrote: Hello, I am facing trouble using hadoop streaming in order to solve a simple nearest neighbor problem. Input data is in the following format '\t' key is the

Re: ClassCastException with LineRecordReader (hadoop release version 0.21.0)

2011-04-20 Thread Amareshwari Sri Ramadasu
Hmm.. This has been fixed in MAPREDUCE-1905, in 0.21.1 Thanks Amareshwari On 4/21/11 7:27 AM, "Claus Stadler" wrote: Hi, I guess I am not the first one to see the following exception when trying to initialize a LineRecordReader. However, so far I could't figure out a workaround for this problem

Re: Help for upgrading my hadoop-0.19.1 version to hadoop-0.20.2

2011-06-28 Thread Amareshwari Sri Ramadasu
Rajesh, We don't encourage users to migrate to new api in branch 0.20, as it is not stable. Thanks Amareshwari On 6/29/11 10:24 AM, "rajesh putta" wrote: Hi, Currently i am running Hadoop-0.19.1.I want to migrate from Hadoop-0.19.1 version to Hadoop-0.20.2.Can any one suggest me how to go a

Re: RDBMS's support for Hadoop

2011-07-12 Thread Amareshwari Sri Ramadasu
Hi, None of RDBMS support Hadoop architecture. But you can have look at Hive(hive.apache.org), a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Thanks Amareshwari On 7/13/1

Re: how to use TotalOrderPartitioner

2011-07-31 Thread Amareshwari Sri Ramadasu
The example Sort, at org.apache.hadoop.examples.Sort, uses TotalOrderPartitioner with InputSampler. You can have a look at it. Thanks Amareshwari On 7/29/11 11:20 PM, "Sofia Georgiakaki" wrote: Good evening, does anyone have an example of how I can use the TotalOrderPartitioner (with InputSam