Re: Stopping datanodes dynamically

2011-01-31 Thread Rekha Joshi
I think if you add datanode host value for the key dfs.hosts.exclude in hdfs conf file and refresh nodes [hadoop dfsadmin -refreshNodes]..might work.Thanks, On 1/31/11 2:42 PM, "Jam Ram" wrote: How to remove the multiple datanodes dynamically from the masternode without stopping it? -- View t

Re: MultipleOutputFormat and MultipleOutputs - which will last?

2010-10-26 Thread Rekha Joshi
Hi Saptarshi, AFAIK, this is an intermediate stage where the old api is supported, while evolving the new api. In 0.21, the old api - MultipleOutputFormat is not deprecated. http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/index.html.In future it might be. >From usage perspective, what Mult

Re: compressed input splits to Map tasks

2010-04-15 Thread Rekha Joshi
By default, with compressed files you lose the ability to control splits and the file is essentially read as one split to one mapper. There had been some discussion in and around this over bzip2, gzip and some fixes are done to allow bzip2 to be splittable.Refer HADOOP-4012 Also Kevin came with

Re: streaming: -f with a zip file?

2010-04-15 Thread Rekha Joshi
I think -file is only for shipping executable or properties file.The extn .zip does not work.Please have a look @ MAPREDUCE-596 Cheers, On 4/15/10 12:27 PM, "Meng Mao" wrote: We are trying to pass into a streaming program a zip file. An invocation looks like this: /usr/local/hadoop/bin/hadoop

Re: Logging info in hadoop

2010-04-07 Thread Rekha Joshi
Nachiket, Not sure which hadoop version/package you are using. If default hadoop, you might like to check ${hadoop.log.dir}/userlogs/xxx, ${hadoop.log.dir}/logs/history, [${hadoop.log.dir} defaults to /tmp if nothing specified] and maybe /grid/0/hadoop/var/log/ .Check also - http://hadoop.apac

Re: Trashbin is not recycled

2010-03-15 Thread Rekha Joshi
..dfs -rmr -skipTrash /user/hadoop/.Trash recreates .Trash, on consecutive rmr...-skipTrash can be used generally if you don't want a backup of deletes, here only to illustrate.. On 3/15/10 2:43 PM, "Marcus Herou" wrote: Hi. Our disks are getting full and I have found that it is the trashbin t

Re: Want to create custom inputformat to read from solr

2010-02-24 Thread Rekha Joshi
The last I heard, there were some discussions of instead creating solr index using hadoop mapreduce rather than pushing solr index into hdfs and so on. SOLR-1045 ad SOLR-1301 can provide you more info. Cheers, /R On 2/24/10 4:23 PM, "Rakhi Khatwani" wrote: Hi, Has anyone tried creatin

Re: Repeated attempts to kill old job?

2010-02-01 Thread Rekha Joshi
I would say restart the cluster, but suspect that would not help either - instead try checking up your running process list (eg: perl/shell script or a ETL pipeline job) to analyze/kill. Also wondering if any hadoop -dfsadmin commands can supersede this scenario.. Cheers, /R On 2/2/10 2:50 AM,

Re: always have killed or failed task in job when running multi jobs concurrently

2010-01-28 Thread Rekha Joshi
You can find out the reason from the JT logs (eg: memory/timeout restrictions) and adjust the timeout - mapred.task.timeout or the memory parameters accordingly.Refer http://hadoop.apache.org/common/docs/r0.20.0/cluster_setup.html Cheers, /R On 1/29/10 12:22 PM, "john li" wrote: when hadoop r

Re: -libjars doesn't work with MR job

2010-01-19 Thread Rekha Joshi
Not sure what error you get and if it is suggestive, but attimes where you place the libjars option can make a difference.You can try adding the jar to your HADOOP_CLASSPATH and then executing? Cheers, /R On 1/20/10 9:50 AM, "Victor Hsieh" wrote: Hi, I was trying to run a mapreduce job with

Re: Close() method in Map Task

2010-01-19 Thread Rekha Joshi
I think TaskInProgress would take care of cleanup after a job is killed. It has flow for killed/failed as well as successful jobs cleanup. You might like to look into JobTracker/ TaskTracker/ HeartbeatResponse as well. Cheers, /R On 1/20/10 10:45 AM, "#YONG YONG CHENG#" wrote: Good Day, I am

Re: rmr: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /op. Name node is in safe mode.

2010-01-18 Thread Rekha Joshi
kherjee" wrote: Hmmm. I am actually running it from a batch file. Is "hadoop fs -rmr" not that stable compared to pig's rm OR hadoop's FileSystem ? Let me try your suggestion by writing a cleanup script in pig. -Thanks, Prasen On Tue, Jan 19, 2010 at 10:25 AM, Rekha Joshi

Re: rmr: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /op. Name node is in safe mode.

2010-01-18 Thread Rekha Joshi
Can you try with dfs/ without quotes?If using pig to run jobs you can use rmf within your script(again w/o quotes) to force remove and avoid error if file/dir not present.Or if doing this inside hadoop job, you can use FileSystem/FileStatus to delete directories.HTH. Cheers, /R On 1/19/10 10:15

Re: Exception when using SequenceFile

2010-01-17 Thread Rekha Joshi
I have come across this error when my gzip file was corrupt, either incompletely formed or corrupted during the data copy over network. Cheers, /R On 1/18/10 8:19 AM, "Zheng Lv" wrote: Hello Everyone, We have been using SequenceFile(GZIP) to save some data in our product, but we got the fol

Re: Is it always called part-00000?

2010-01-17 Thread Rekha Joshi
_SUCCESS file is created after the hadoop job has successfully finished.Setting I think is mapreduce.fileoutputcommitter.marksuccessfuljobs. You can leverage this file existence to kick off your second step. Alternatively you can capture the process id or logs to verify the conclusion of the fir

Re: Configuration for Hadoop running on Amazon S3

2009-12-17 Thread Rekha Joshi
Not sure what the whole error is, but you can always alternatively try this - fs.default.name s3://BUCKET fs.s3.awsAccessKeyId ID fs.s3.awsSecretAccessKey SECRET And I am not sure what is the base hadoop version on S3, but possibly if S3 wiki is correct try updating conf/hadoo

Re: capacity_scheduler: working

2009-12-15 Thread Rekha Joshi
You might want to check if the queue is also set in mapred.queue.names property and you have access allowed to the queue.Thanks, On 12/16/09 12:33 PM, "anjali nair" wrote: I tried changing the mapred-site.xml by creating the new property. But still it doesn't work. The job goes to the default q

Re: Problems on configure FairScheduler

2009-12-10 Thread Rekha Joshi
What's your hadoop version/distribution? In anycase, to eliminate the easy suspects first, what do the hadoop logs say on restart?Did you provide port on the job tracker url?Thanks! On 12/11/09 8:43 AM, "Jeff Zhang" wrote: Hi all, I'd like to configure FairScheduler on hadoop. but seems it ca

RE: Out of Java heap space

2009-12-07 Thread Rekha Joshi
If it is hadoop 0.20 the files to modify are core-site.xml, hdfs-site.xml and mapred-site.xml, while the default configs are in core-default.xml,hdfs-default.xml and mapred-default.xml. Otherwise also, are you saying that providing -D works with same memory but not via onfig? If not, for memor

Re: Start Hadoop env using JAVA or HADOOP APIs (InProcess)

2009-12-03 Thread Rekha Joshi
Everythings in here - http://hadoop.apache.org/common/ On 12/4/09 9:55 AM, "samuellawrence" wrote: Hai, I have to start the HADOOP environment using java code (inprocess). I would like to use the APIs to start it. Could anyone please give me snippet or a link. Thanks in Advance. -- View th

Re: "dfsadmin -report" says that i need "Superuser privilege". What? :)

2009-11-30 Thread Rekha Joshi
I remember playing around with dfsadmin sometime back and wasn't this always the case? Was able to use it for only few commands like upgradeProgress<> without getting the superuser issue. Anyways, as per HADOOP-2659, dfsadmin are only meant for admin. Thanks! On 12/1/09 3:48 PM, "pavel kolodin"

Re: Help with Hadoop pipes

2009-11-30 Thread Rekha Joshi
It is better to use cloudera wordcount pipes example itself to be using the correct params. Also the issue here seems to be in building itself.Did you check what `config.log' suggest to be the cause why C compiler failed? Were you able to compile any other c++ program with this build file on you

Re: AW: KeyValueTextInputFormat and Hadoop 0.20.1

2009-11-27 Thread Rekha Joshi
https://issues.apache.org/jira/browse/MAPREDUCE-655 fixed in version 0.21.0 On 11/26/09 9:43 PM, "Matthias Scherer" wrote: Sorry, but I can't find it in the version control system for release 0.20.1: http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1/src/mapred/org/apache/hadoop/

Re: Hadoop 0.20 map/reduce Failing for old API

2009-11-26 Thread Rekha Joshi
The exit status of 1 usually indicates configuration issues, incorrect command invocation in hadoop 0.20 (incorrect params), if not JVM crash. In your logs there is no indication of crash, but some paths/command can be the cause. Can you check if your lib paths/data paths are correct? If it is a

Re: Saving Intermediate Results from the Mapper

2009-11-24 Thread Rekha Joshi
https://issues.apache.org/jira/browse/HADOOP-372 has valuable information on InputFormat/MapInput/RecordReader, you may try using the pseudo code. Thanks! On 11/25/09 9:35 AM, "Gordon Linoff" wrote: Does anyone have a pointer to code that allows the map to save data in intermediate files, for u

Re: Hadoop on EC2

2009-11-24 Thread Rekha Joshi
If you use hadoop fs -ls hdfs:// that will work for your intent. Thanks! On 11/25/09 2:31 AM, "Mark Kerzner" wrote: Hi, I am starting a cluster of Apache Hadoop distributions, like .18 and also .19. This all works fine, then I log in. I see that the Hadoop daemons are already working. However,

Re: Hadoop Performance

2009-11-23 Thread Rekha Joshi
Hi, Not sure about your hadoop version, and havent done much on single m/c setup myself. However there is a IPC improvement bug filed @ https://issues.apache.org/jira/browse/HADOOP-2864.Thanks! On 11/24/09 11:22 AM, "onur ascigil" wrote: I am running Hadoop on a single machine and have some

Re: Job.setJarByClass in Hadoop 0.20.1

2009-11-23 Thread Rekha Joshi
If your paths/files are alright, think job.setJarByClass(yourClassName.class); should work.Thanks! On 11/24/09 9:58 AM, "Zhengguo 'Mike' SUN" wrote: Job.setJarByClass()