Re: Using addCacheArchive

2009-07-01 Thread akhil1988
System.get(conf); FSDataInputStream din = fs.open("/home/akhil1988/sample.txt"); The method (below)that you gave does not work: Path cachePath= new Path("hdfs:///home/akhil1988/sample.txt"); BufferedReader wordReader = new BufferedReader(new FileReader(cachePath.toString())); A f

Re: Using addCacheArchive

2009-07-01 Thread akhil1988
Please ignore the last two lines (after Thanks)! Akhil akhil1988 wrote: > > Hi Chris! > > Sorry for the late reply! > > To push the file into HDFS is clear to me and it can be done using "hadoop > fs -put" command also (prior to executing the job), which I ge

Re: Using the Stanford NLP with hadoop

2009-07-01 Thread akhil1988
Hi hari! To get the englishPCFG.ser.gz file into the current working directory of the task trackers use DistributedCache class. First put you englishPCFG.ser.gz.ser into HDFS usinf "hadoop fs -put" command. Now, Suppose your file is lying in HDFS at /home/hari/englishPCFG.ser.gz Now in the ma

Re: Archives not getting unarchived at tasktrackers

2009-07-01 Thread akhil1988
HI All, Has anyone tried this method? Please, let me know I am struck at this. Thanks, Akhil akhil1988 wrote: > > Hi All, > > I am using DistributedCache.addCacheArchives() to distribute a tar file to > the tasktrackers using the following statement. > > DistributedC

Hadoop auto-installation scripts

2009-07-01 Thread akhil1988
Hi All, Has anyone written Hadoop auto-installation script for a cluster? If yes, please let me know. Thanks, Akhil -- View this message in context: http://www.nabble.com/Hadoop-auto-installation-scripts-tp24301058p24301058.html Sent from the Hadoop core-user mailing list archive at Nabble.com

Re: Archives not getting unarchived at tasktrackers

2009-07-01 Thread akhil1988
Can you look > at the testcase and see if it make sense for you? > Thanks > Amareshwari > akhil1988 wrote: >> HI All, >> >> Has anyone tried this method? Please, let me know I am struck at this. >> >> Thanks, >> Akhil >> >> >> akh

Re: Using addCacheArchive

2009-07-02 Thread akhil1988
http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata > > On Fri, Jun 26, 2009 at 5:55 PM, akhil1988 wrote: > >> FileInputStream fin = new FileInputStream("Config/file1.config"); >> >> where, >> Config is a directory which contains many files/d

Re: Can you help me about how to use hadoop jar?

2009-07-10 Thread akhil1988
The steps are given in the apache mapred tutorial. However, I will restate them for your case: First of all compile your AB.java file and then put all the classes generated into a single jar file using jar -cvf p.jar AB.class [other class files if any] If your AB.java is in a package then run ja

Restarting a killed job from where it left

2009-07-11 Thread akhil1988
HI All, I am looking for ways to restart my hadoop job from where it left when the entire cluster goes down or the job gets stopped due to some reason i.e. I am looking for ways in which I can store at regular intervals the status of my job and then when I restart the job it starts from where it

Looking for counterpart of Configure Method

2009-07-13 Thread akhil1988
Hi All, Just like method configure in Mapper interface, I am looking for its counterpart that will perform the closing operation for a Map task. For example, in method configure I start an external daemon which is used throughout my map task and when the map task completes, I want to close that d

Re: Looking for counterpart of Configure Method

2009-07-14 Thread akhil1988
Does anyone knows the answer to my question or is there any alternative to do this? Thanks, Akhil akhil1988 wrote: > > Hi All, > > Just like method configure in Mapper interface, I am looking for its > counterpart that will perform the closing operation for a Map task. Fo

Re: Restarting a killed job from where it left

2009-07-14 Thread akhil1988
> > Have a look at the mapred.jobtracker.restart.recover property. > > Cheers, > Tom > > On Sun, Jul 12, 2009 at 12:06 AM, akhil1988 wrote: >> >> HI All, >> >> I am looking for ways to restart my hadoop job from where it left when >> the >>

Why /tmp directory?

2009-07-17 Thread akhil1988
Hi All, I want to know why do we generally use tmp directory(and not any other) for storing hdfs data, knowing the fact that tmp directory is used for storing only temporary data? I was wondering this because when I run a Hbase job on large data, I get this DiskErrorExcpetion: org.apache.had

DiskErrorException and Error reading task output

2009-07-18 Thread akhil1988
Hi All, Can anyone tell when does one get these errors? Error initializing attempt_200907151459_0096_m_01_1: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_200907151459_0096/job.xml at org.apache.hadoop.fs.

Why only few map tasks are running at a time inspite of plenty of scope for remaining?

2009-07-23 Thread akhil1988
Hi all, I am using a HTable as input to my map jobs and my reducer outputs to another Htable. There are 10 regions of my input HTable. And I have set conf.set("mapred.tasktracker.map.tasks.maximum", "2"); conf.set("mapred.tasktracker.map.tasks.maximum", "2"); c.setNumReduce

having a directory as input split

2010-04-29 Thread akhil1988
How can I make a directory as a InputSplit rather than a file. I want that the input split available to a map task should be a directory and not a file. And I will implement my own record reader which will read appropriate data from the directory and thus give the records to the map tasks. To ex

Re: How to add external jar file while running a hadoop program

2010-05-07 Thread akhil1988
You need to jar the stanford-parser with your ep.jar For this you canunjar the stanford-parser.jar using jar -xvf stan...jar jar -cvf ep.jar stanford/directory ep/ harshira wrote: > > am new to hadoop. > > I have a file Wordcount.java which refers hadoop.jar and > stanford-parser.jar >

FileInputStream for HDFS

2010-05-15 Thread akhil1988
I have a file that contains java serialized objects like "Vector". I have stored this file over Hadoop Distributed File System(HDFS). Now I intend to read this file (using method readObject) in one of the map task. I suppose FileInputStream in = new FileInputStream("hdfs/path/to/file"); wont' wo